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Abstract 



^D ' Hyperbolic Programming (HP) -minimizing a linear functional over an affine sub- 

^^ ■ space of a finite-dimensional real vector space intersected with the so-called hyperbol- 

icity cone- is a class of convex optimization problems that contains well-known Linear 
\^ I Programming (LP). In particular, for any LP one can readily provide a sequence of 

s-^ ■ HP relaxations. Based on these hyperbolic relaxations, a new Shrink-Wrapping ap- 

,j^ . proach to solve LP has been proposed by Renegar. The resulting Shrink-Wrapping 

j^ I trajectories, in a sense, generalize the notion of central path in interior-point methods. 

We study the geometry of Shrink-Wrapping trajectories for Linear Programming. 
In particular, we analyze the geometry of these trajectories in the proximity of the 
so-called central line, and contrast the behavior of these trajectories with that of the 
^ ' central path for some pathological LP instances. 

^^ ■ In addition, we provide an elementary real proof of convexity of hyperbolicity cones. 

in 

y^ '. 1 Introduction 

o, 

^D . We consider LP in its standard form 

minjc'^x : Ax = b,x e M+} 

•1—1 X ~^ 

X 

$H ' where c G W^,b £ W^,A E M™^", m < n, and W] denotes n-dimensional nonnegative 



orthant. LP is paramount in many applications of mathematical programming today. 
Amongst numerical methods employed to solve LP instances in practice, the two 
most notable classes are the so-called pivot-type methods and the interior-point meth- 
ods. For a given method, of particular theoretical and practical interest is dependence 
of number of iterations (and elementary arithmetic operations) required to solve an LP. 
Assuming that the LP data, namely, the triple {A, b, c} are rational, one may measure 
its bit input complexity as the number of {0, l}-bits required to store the data. If any 
LP instance may be solved by the method in at most polynomial number of arithmetic 
operations in m, n and L, such a method is called polynomial time algorithm for LP. 
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Pivot-type methods, such as simplex method, follow the faces of the polytope that 
define the problem's feasible region {x : Ax = b,x £ M"}. Although, some variants of 
pivot-type methods require polynomial number of iterations on average, and perform 
well in practice, it is not known wether there exists a polynomial time algorithm within 
this class. 

In contrast, the interior-point methods follow some continuous trajectory typically 
inside the feasible region; many variants of these methods are known to be polynomial 
time algorithms and perform well in practice especially for very large (in terms of m 
and n) LP instances. 

A variant of the question of wether there exists a strongly-polynomial algorithm for 
solving LP -in simple terms, the algorithm whose running time would depend only on 
m and n- was cited by S. Smale as on of the 18 greatest unsolved problems of the 21st 
century, and "is the main unsolved problem of linear programming theory." 

Linear Programming, when viewed from the point of view of the so-called hyperbolic 
polynomials, exhibits rich and beautiful algebraic structure, which seems not to be 
exploited yet by any of the existing methods. It is our motivation to take advantage 
of this structure in hope to develop potentially more efficient methods to solve this 
optimization problem. 

The extensive study of hyperbolic polynomials begins with the work of Lars Garding 
|13j . which dates back to 1950's, in the context of partial-differential equations; here 
the author established a number of important results about the hyperbolic polynomials 
including the convexity of the associated hyperbolicity cones. The notion of hyperbolic 
programming was first introduced in [15]; here the author demonstrated, in particular, 
that the hyperbolic programming problems can be efficiently solved using the interior 
point methods, and gave a first characterization of the hyperbolicity cones as a set of 
polynomial inequalities (although, quite different and more complicated then the one 
in [19] that we partially rely on). Further study of hyperbolic polynomials in the context 
of convex optimization was done by the group of authors of [2]; a number of important 
observations were made regarding the connections of hyperbolic polynomials with the 
symmetric functions, and in particular, the elementary symmetric functions. This 
latter reference is an excellent introduction to hyperbolic polynomials in the context 
of mathematical programming. This line of research was continued in [19], where 
many important properties of the boundary of the hyperbolicity cones are revealed 
together with the relevance of the so-called hyperbolic derivative cones. In the new 
paper [20] generalized trajectories for solving hyperbolic programming problems based 
on hyperbolic relaxations are introduced. 

We study the so-called Shrink- Wrapping algorithm for LP by analyzing the local 
behavior of its trajectories. In Section [2] we review the notions of hyperbolic polyno- 
mial and hyperbolicity cones giving the first proof of Garding's key result on cones' 
convexity that does not rely on complex variables; in Section [3] we introduce Shrink- 
Wrapping for LP; in Section|4]we analyze the behavior of Shrink- Wrapping trajectories 
in the proximity of a certain invariant set that contains optimal LP solution and, as 
a consequence, describe a simple idealized locally super-quadratically convergent dis- 
crete bi-section scheme; in Section[5]we contrast Shrink- Wrapping trajectories with the 
so-called central path for some pathological LP instances. 



2 Basics 

2.1 Hyperbolic polynomials and hyperbolicity cones 

Mainly, we follow the exposition of elementary properties of hyperbolic polyno- 
mials found in |19j . However, unlike the original proof of convexity of hyperbolicity 
cones in [13] or its later version, e.g., in [19], our approach is rather geometric and 
does not rely on complex numbers, thus, bringing it closer to the spirit of continuous 
optimization. 

Let X be a finite-dimensional real vector space. Recall a polynomial p : X — )• M is 
homogeneous of degree m if p[tx) = f^p{x) for all i € M and every x ^ X. 

Definition 2.1. Let p : X — t- ]R be a homogeneous polynomial of degree m and d £ X 
is such that p{d) > 0. p is hyperbolic with respect to d if the univariate polynomial 
1 1— 7- p{x + td) has 771 real roots for every x & X. 

Examples: 

• X = M™, d = 1 G M"^ ~ the vector of all ones. The m}^ elementary symmetric 
polynomial Em[x) = Y\^iXi is a hyperbolic polynomial with respect to 1, since 
1 1—)- Em{x + tl) = XYiLi {xi + t) has roots — Xj, i = 1, . . . , m, 

• X = S™ - the space of real symmetric m x m matrices, d = I - the identity 
matrix. The determinant det(x) is a hyperbolic polynomial with respect to /, 
since the eigenvalues oi x € X are minus the roots of t i— )• det(x + tl) and are 
real. 

By analogy with the last example, given a hyperbolic polynomial p and its hyperbolicity 
direction d, the roots of A i— )■ p{x — Xd) are called the eigenvalues of x in direction d, i.e., 
the eigenvalues are precisely the roots of t i— )• p{x + td) with signs reversed. Ordering 
eigenvalues in non-decreasing order, we denote them by 

Ai(x) < X2{x) < ■■■Xmix). 

Remark 2.2. If a homogeneous polynomial is such that 1 1— t- p{x + td) has m real roots 
for every x £ X but p{d) < 0, then —p{x) is hyperbolic with respect to d. Therefore, 
our definition may have been augmented to require only p(d) ^ 0. In turn, p{d) 7^ is 
an essential requirement to preserve the resulting cone's convexity and thus may not 
be further relaxed; for an illustrative example see |19j . 

Recall that a set is a cone if it is closed under multiplication by nonnegative reals. 

Definition 2.3. The hyperbolicity cone of p with respect to d, written C{d), is the set 
{xeX :p{x + td) / 0, Vt > 0}. 

We omit p from the notation above as it will be clear which polynomial we refer to. 
C{d) is a cone by homogeneity of p. 
Examples: 

• X = W^, d = 1, p{x) = Em{x), then C{d) = WJ^^ is strictly positive orthant. 



• X = S"*, d = I, p{x) = det(2;), then C{d) is the cone of positive definite matrices. 

Proposition 2.4. Given a hyperbolic polynomial p and its hyperbolicity direction d, 

(A) for fixed real a > we have Xi{ax) = aAj(x), i = 1, . . . ,m, 

(B) for fixed real (3 we have Aj(x + I3d) = Xi{x) + f3, 

(C) if e £ C{d), then the linear segment [e,d] C C{d), and, more generally, [ye,6d] C 
C{d) for any 7, 5 > 0. 

Proof. Part A follows immediately from the definition of eigenvalues. Just as the case 
of symmetric matrices, part B is readily established by a simple regrouping of variables 
p{{x + I3d) — Xd) = p{x — (A — /3)d). Part C follows from A and B: observe that for any 
^ G (0, 1) we have 

A^(Ce + (1 - m = eAi (e + ^d\ = ^ (xiie) + ^) > 0, 

for all i = 1, . . . ,m, since Aj(e) > 0, and so ^e + (1 — ^)d G C{d); similarly, a more 
general statement follows. D 

As a straightforward consequence, we can make two important observations. 

Proposition 2.5. C{d) = {x e X : Xi{x) > 0}. 

Proof. Follows from Proposition 12.41 part B and p{x) = p{d) YYl^i Xi{x) , where the 
coefficient p{d) > in the identity is a consequence of considering limi-|-oo p{x + td) and 
homogeneity of p. D 

Proposition 2.6. C{d) is a (linearly) connected component of {x € X : p{x) > 0} 
containing d. 

Proof. From the previous proposition it follows that C{d) <Z {x ^ X : p{x) > 0}. 
Clearly Xi{d) = 1 for all i = 1, . . . ,m, so d G C{d). To establish connectivity of C{d), 
for e, / G C{d) observe [e, d\ U [d, /] C C{d) by Proposition [231 part C. D 

Another important conclusion to be made from Proposition l2.4l is that the cone C{d) 
is defined locally around its hyperbolicity direction d. That is, in order to describe C{d) 
it suffices only to know the behavior of the eigenvalues in the small ball around d, while 
the properties A, B and C tell us explicitly how to compute the boundary of the closure 
of the cone C{d) having this information. For ease of reference, we distill the above 
mentioned computational procedure into the following statement. 

Proposition 2.7. For fixed real u, the roots ti of t ^ p{d + uj{e — d) + td) satisfy 

ti = oj{l - Xi{e)) - I. 

Proof Observe p{d + uj{e - d) + td) = p{uje + (1 - a; + t)d) =uj'^p{e + ^^^^d) . D 

That the cone is defined locally is also a straightforward consequence of analyticity 
of p. This localization around d is a key principle that allows us to establish the 
convexity of C{d) as a corollary to the following important property. 
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Theorem 2.8. If e ^ C-{d) then p is hyperbolic with respect to e and C(e) = C{d). 



Example: 

• X = S^, d = I, p{x) = det(x); note that not every element e G X, p{e) > 0, 
gives rise to a hyperbolicity direction, e.g., consider eigenvalues of a linear matrix 
pencil corresponding to det(^ — XB) = where 



A 



/ 1 \ 

10 

10 

V 1 / 



B 



/ 1 1 \ 

10 0-1 

10 

V -1 / 



The eigenvalues A are iti each with multiplicity two, although det(i?) = 1. (We 
would like to thank Prof. Peter Lancaster for providing this example.) Interest- 
ingly, in case of X = S^ the condition det{B) > suffices to ensure that all the 
eigenvalues of a matrix pencil A — XB are real, as it amounts to B being either 
positive or negative definite - the latter is a standard sufficient condition for the 
linear matrix pencil spectra to be real, which may be easily established by say 
pre and post-multiplying A — XB by inverse Cholesky factors of -B or —B. 

Given a homogeneous polynomial p, an intriguing question is to characterize all e G X 
giving hyperbolicity directions to p, if such exist. 

Before we prove Theorem 12.81 we start with more elementary but illustrative exer- 
cise of showing that one may perturb d ever so slightly maintaining hyperbolicity of p. 
This proof, with some minor modifications, essentially carries over to the proof of our 
theorem in question. 

Proposition 2.9. Given a hyperbolic polynomial p and its hyperbolicity direction d, 
there exists e > such that for any A, ||A|| < e, polynomial p is hyperbolic with respect 
to d + A. 

Proof. By Proposition 12.41 parts A and B, it suffices to show that there exists an 
open neighborhood around d = d + A such that for any point x in this neighborhood 
t I—)- p{x + t d) has all real roots; any point z € X may be shown to have associated 
roots of 1 1— 7- p{z + td) real by translating z along vector d to a properly scaled version 
of this neighborhood, where every point in the neighborhood including d is carried into 
its multiple by some fixed positive constant, see Figured) 

Consider p of degree m, hyperbolic with respect to d. Clearly Xi{d) = 1, i = 
1, . . . ,m. Let Bg{d) be an open ball of radius e > around d such that Vy G Bs{d) we 
have |Aj(y) — 1| < ^, i = 1, . . . ,m. Such a ball exists by continuity of Aj. 

Fix A G X, ||A|| < e, and consider a mapping 

T H> p{x + tA + td) 

for each fixed x and real r producing a polynomial in t. Observe that for any x G 
BL{d + A) = {xeX:x = {d + A) + 6,\\6\\ < §} and r G [-3/2,-1/2] we have 



\x + tA- d\\ = \\{d + A) + 6 + TA-d\\< 



11 Ml A II ^ ^ 

|l + r|||A||<- + -=e. 



Open "real roots" 
neighborhood of d 



Scaled version of the 
neighborhood; use A 




Translate of z by a 
multiple of (/ ; use B 



Figure 1: Localization of real roots around d 



So, for any fixed x € B£{d+A) and r G [—3/2, —1/2], a polynomial 1 1— )■ p{{x+TA)+td) 
has all real roots in the interval [—3/2, —1/2] by the choice of e. 

Now, in order to show that 1 1— )• p{x + t{d + A)) has all real roots Vx G B^ (d + A), 
start increasing r in t i— )• p{x + tA + td) from -3/2 to -1/2, see Figure [21 whenever 
t = T intersects t = Aj(x + rA), we capture one of the desired real roots, increasing r 
until we extract m roots. D 

Note that the proof of the proposition does not rely on the initial neighborhood of 
d being a ball. Thus, we extend the proof to establish hyperbolicity of p with respect 
to e £C{d). 

Proof of Theorem \2.8[ To establish hyperbolicity of p with respect to e G C{d), just as 
before, it suffices to show that there is an open neighborhood of e such that 1 1— )• p{x+te) 
has m real roots for all x in the neighborhood. 

By homogeneity of p, without loss of generality we may assume < 27 < Aj(e) < 
1 — 27 < 1, i = 1, . . . ,m. By Proposition 12.71 for e' = d — {e — d) we have < 27 < 
Xi{e) < 1 — 27 < 1, i = 1, ... ,771, and so the linear segment [e, e'] belongs to C{d) by 
Proposition 12.41 part C. Moreover, by Proposition 12.71 and continuity of Aj there exists 
e > such that the open "tubular" neighborhood M around [e, e'] consisting of convex 
combination of two open balls of radius e around e and e' , M = conv{B^{e),B^{e')), 
satisfies Vy G A/" we have Aj(y) G (7, 2 — 7), i = 1, . . . , m. 



t - -A,)(x+ rA) 



t- -A,2(^+ ^^) 



I / - -}^3(x+ rA) 




Figure 2: Identifying m real roots of t i— )■ p(a; + t d) 



Consider A = e — d and r i— )• ^(x + rA + id). Note x + rA G M for all x S B^{e) and 
r G [—2, 0]. So, for any fixed x € -Be(e) and r G [—2, 0] the polynomial 1 1— )• p(x + rA + 
td) = p{x + r(e — d) + td) has m real roots in the interval (—2 + 7, —7). Therefore, by 
increasing r from -2 to we may identify m real roots of 1 1— )■ p(x + t((i + A) = p{x + te) 
as intersections of t = r and t = —Xi{x + rA). 

Finally, that C{e) = C{d) easily follows from Proposition 12.61 D 



We are in position to prove the convexity of C{d); as observed in |19j this is a 
consequence of Theorem 12.81 we restate the proof for completeness. 

Theorem 2.10. C{d) is an open convex cone. 

Proof. C{d) is an open set by continuity of p and Proposition 12.61 Consider x,y £ C{d). 
Note C{y) = C{d) and so by Proposition 12.41 part C we have [x,y] G C{y) = C{d). D 

In [13] the convexity of C{d) was established as a corollary to the following result. 

Fact 2.11. Ai(x) is a concave function of x. 

Indeed, \i x,y G C{d) and Xi{x) is concave, then for any ^ G (0,1) we have Ai(.^3; + 
(1 — £^)y) > i\i{x) + (1 — C)^i{y) > 0- Later in [18] it was shown that conversely, the 



concavity of Xi{x) follows from convexity ofC{d), using much simpler proofs yet still 
relying on complex numbers. If we introduce sums of the smallest k eigenvalues 

k 

■sfc = y^ At 



Hi 



a more general statement regarding the eigenvalues may be established [2]. 
Fact 2.12. Sk{x) is a concave function for any k = 1, . . . , m. 

Next, we turn our attention to the so-called hyperbolic derivatives. 

2.2 Hyperbolic derivatives and cone characterization 

Given a hyperbolic polynomial p of degree m and its hyperbolicity direction d, the 
directional derivative of p(x) along d is called the hyperbolic derivative polynomial of p 
with respect to d, denoted 

p'{x) =p[{x + td)\t=o. 

We refer to p' simply as the derivative polynomial of p, omitting d for brevity of 
notation. By the root interlacing property for the polynomials with all real roots - 
by continuity, for fixed x between any two roots of t i— t- p{x + td) there is a root of 
1 1— )• p[{x + td)- it follows that p'{x) is also hyperbolic with respect to d. 

Similarly, for a fixed hyperbolicity direction d, we can define higher derivatives 
p",p"' , ■ ■ ■ ,p^"^' as p^^>{x) = p\ (x + td)\t=Q. Since p is of degree m, p(™~^) is linear 
and p^'^'{x) is constant. 

Examples: 

• X = M™, d = 1, p{x) = Em{x), then 

E^J^\x) = k\Err^^k{x) 

where Ej {x) is the j elementary symmetric polynomial 

-^l(^) = Z^l<j<m^«i E2(X) = 2^i<j<j<m^«^i' • • • ' Em[X) = lli<j<m^i) 

• X = M'", d E 1R++, pix) = Em{x); by a similar inductive argument as above one 
can show that 



E^JlHx) = klE„^{d)E, 



m—k 



Xl X2 Xyyi 

dl ^2 dm 



Remark 2.13. The elementary symmetric polynomials in the example above play an 
important role in representing the derivative polynomials via the eigenvalues at x £ X. 
Namely, since p{x + td) = p{d) ni<i<m,(^ + \{x)) we have 

p\^)=^Ap{d) n (i + A.(x)) 



dt . 

l<i<m 



= Pid) E Yl^jix)=P{d)EmMKx)) 

where A(x) is the vector of m eigenvalues of x, and more generally 

p^''Hx) = k\pid)Em-kiHx))- 



For the k^^ hyperbolic derivative of p, we use C^'^^d) to denote the associated 
hyperbohcity cone; note C^™~^' (d) is an open half-space and C^"^~^' (d) = X. Although, 
C(e) = C{d) for any e € C{d), the hyperbohcity cones corresponding to derivative 
polynomials p' ,p" , . . . with respect to e 7^ d might not coincide with one another, as in 
the last example where, for instance, k = m — 1. 

It turns out that these derivative polynomials come in handy in characterization 
of the hyperbohcity cone C{d) itself as observed in [19]. We note that if all Aj(x) > 
then clearly p^^'{x) > for all A; = 1, . . . ,77i. Conversely, by Taylor series of p{x + td), 

J.2 j.m—1 j.m 

p^a: + td)=p{x)+p'{x)t + p"{x)- + ...+p(^-^\x)- ttt +p('")(x)^, 

observe that if all p^^'{x) > 0, then p{x + td) > for all t > 0, and thus x G C{d). 
Fact 2.14. The hyperbolicity cone satisfies 

C{d) = {xeX : p{x) > Q,p{x) > 0,p"{x) > 0, . . . ,p('"-^)(x) > 0}. 
As an important consequence of this fact we have the following cone inclusion. 

Corollary 2.15. 

C{d) c C'{d) c C"{d) c . . . c C^^^i^d). 

Throughout the rest of the manuscript we will be concerned with the closure of 
a hyperbolicity cone, cl C{d). Due to continuity of p{x) all the results in this and 
previous subsections naturally extend to cl C{d) by replacing strict inequalities with 
corresponding inequalities when necessary. To this end, we note that, for example, the 
closed cone clC(d) = {x e X : Ai(x) > 0} = {x e X : p^^\x) > 0, A; = 1, . . . , m - 1} C 
clC'{d) C ••• C clC^'^~^'{d) is convex, etc.; likewise, one may easily characterize the 
boundary of cl C{d) as follows, see |19j . 

Corollary 2.16. d {cl C{d)) = {x £ X : p{x) = 0,p'(x) >Q,...,p^'^-^\x) > 0}. 

Proposition 2.17. If x e clC^''\d)i^clC^''+^\d) for some r > 0, then x G clC(d). 

Proof. By the inclusion property for derivative cones, x must belong to the boundary of 
both cones, x G 9 (clC^''^(d)) P| 9 (clC^''"'"^)(d)). Consequently, by the root interlacing 
property for polynomials with all real roots, it follows that is a root of multiplicity 
fi> 2 corresponding to 1 1— >• p^'^>{x + td): by contradiction, if has multiplicity 1, then 
the derivative polynomial t 1— )• p^^^^'{x + td) cannot have as its root. Analogously, 
if r > 1 then t 1— ?• p(''~^)(^+*") must have as its root of multiplicity ^ + 1, etc. So, 
indeed, x G clC((i), and, in particular, x lies on the boundary of the cone. D 

Example: 

• X = M", d G IR++, p{x) = En{x); let JC^^d denotes the closure of the hyperbohcity 
cone associated with r^^ derivative polynomial of p with respect to d, /C^.d = 
cl C^^'{d). The following cone inclusion 



gives a natural sequence of relaxations of the nonnegative orthant M" , a piv- 
otal observation for building Shrink- Wrapping framework for linear programming. 
Note that K^.d coincides with the closure of hyperbolicity cone associated with 
En-r{x./d), where x./d is a componentwise ratio of vectors x and d; observe 
Kr,d = {x ^ M" : X = d. • 2, z G /Cr,i}i where d. ■ z \s a. componentwise product of 
two vectors; in particular, K^-i^d is a half-space passing through the origin with 
normal vector l./d. 

Remark 2.18. Interestingly, for x £ d{clC{d)) we have VAi(x) parallel to Vp{x). 
Let X = M"; considering x £ d{clC{d)) so that = Ai(a;) < A2(a;), recall p{x) = 
Pid)Uj=i-,n>'ji^) and so 



j=l,m j=l,m, kf^j fc^l 



x) 



givmg 

Vp{x)=p{d)llXk{x)-VXiix). 

Similarly, if i i— t- p[x + td) has as its root of multiplicity /^ > 1, one may consider the 
boundary of the corresponding derivative cone c\C^^~^'{d) instead. 

Although, there is a simple algebraic characterization of the hyperbolicity cones, 
their dual cones are poorly understood, with some exceptions, e.g., [71l22j. 

2.3 Hyperbolic programs and relaxations 

The significance of hyperbolicity cones in convex optimization becomes evident once 
we introduce the three most prominent instances of the so-called conic programming 
problems. Letting X be equipped with an inner product (•,•), a conic programming 
problem is an optimization problem of the form 

inf{(c, x) : Ax = b,x £ K} 

X 

where iT C X is a closed convex cone, c £ X, b £ R"^ and A : X ^ M"* - a linear 
operator. It is well known that any convex optimization problem can be recast as conic 
programming problem. 

The three most prominent instances of conic programming are: 

• LP, X = M", {x, y) = x^y, K = M!^, 

• Second-Order Conic Programming (SOCP), X = W\{x,y) = x y,K = Ki x 
K2 X ■ ■ ■ X Ki with second-order cones Ki = {{x,t) £ M"'~^ x M : ||x|| < t}, 
X]j=i ni = n, and 

• positive Semi-Definite Programming (SDP), X = S"*, {x,y) = trace(xy) and K 
- the cone of positive semi-definite matrices. 
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In applications, these three types of problems provide an extremely powerful modeling 
framework, ranging from production planning, relaxations to hard combinatorial prob- 
lems, mathematical finance and Markov chains, to control theory and polynomial pro- 
gramming [8] , [H] , [1] , [21] , [3] , [IS] . Also, they naturally arise as robust counterparts [3] 
to one another in the presence of uncertainty in the initial data, e.g., [6]. 

A Hyperbolic Programming (HP) problem is a conic programming problem where 
if is a closure of hyperbolicity cone. Note LP, SOCP and SDP are instances of HP. 

Remark 2.19. When implementing an interior-point method for SDP it is frequently 
required to determine how far one may advance along a given vector h G S'" from some 
point e G §"^ in the cone of positive definite matrices, before hitting the boundary of 
the closure of this cone. Typically, the procedure is considered to be computationally 
expensive due to its implementation as "trial and error" testing on whether a given 
vector e + coh is still in the cone, a; £ M. Theorem 12.81 combined with Proposition 12.71 
gives an elegant basis for an alternative relatively inexpensive procedure. Note that 
with respect to det(-), C(e) coincides with the cone of positive definite matrices. Using 
Cholesky factors of e = LL^, one may compute the largest eigenvalue oi L~^{e — h)L~^ 
or its approximation, say, using Lanczos-type algorithm provided e — h is also positive- 
definite, and subsequently use this value to determine the maximum allowed step-length 
along h using Proposition 12. 7i 

In what follows, within Shrink- Wrapping framework, together with a linear pro- 
gramming instance 

minic^x : Ax = b,x e M+} (LP) 

X 

where c G M", b G M"^, A G M™^", m < n, we consider its r*^ hyperbolic relaxation with 
respect to some fixed d G K++, < r < n — 1, 

min{c X : Ax = b,x ^ )Crd} (HPrd) 

X ' ' 

recalling that JCr^d is the closure of hyperbolicity cone corresponding to r^^ hyperbolic 
derivative r\En{d)En-r{x- / d) of En{x) with respect to d. Let x* and x{d) denote 
optimal solutions for LP and HP^^d respectively; for convenience, we are assuming x* 
is a unique minimizer for LP. 
Example: 

• consider linear programming problem in.mx{c^x : \^x = 3, x G IR+} together 
with its first-order hyperbolic relaxation min^-jc x : \ x = 3,x G /Ci^^} where 
d = 1, see Figure [3l note that the feasible region of HPi\ is inscribed by a circle 
in {x G M^ : \^x = 3} centered around d = 1. 

Although, the above example is fairly simple, it illustrates a few key geometric concepts 
of Shrink- Wrapping throughout the manuscript . 

Hyperbolic relaxations HP^^d will be used to define a family of continuous trajecto- 
ries terminating at the LP optimum. In a similar fashion, one may define hyperbolic 
relaxations for any other HP instance besides LP, including SOCP and SDP. 
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Figure 3: LP and its hyperbolic relaxation 

3 Shrink- Wrapping approach for LP 

3.1 Main ingredients 

Proposition 3.1. If bounded, HPr^d has a unique solution x{d) unless x{d) solves LP. 

Proof. The boundary of fCr^d at x has strict curvature except along x itself, unless 
X £ R", see |19] . Theorem 14; note that the only flat faces of ICr^d ^^^ precisely 
n — r — 1 and lower dimensional faces of the nonnegative orthant, since in the latter 
case En^r{x./d) =0. D 

We are interested in recovering LP solution using hyperbolic relaxations. Although, 
our present investigation is mostly of theoretical nature, we would like to comment on 
practicality of the underlying assumptions to indicate potential usability of this new 
setting. To this extent, we assume that 

(A) LP is bounded, 

(B) we know an initial strictly LP- feasible point c? G {x € M" : Ax = b,x £ M"^}, 

(C) the corresponding hyperbolic relaxation HP^^d is bounded as well, 

(D) x{d) is not LP-optimal, 

(E) we can easily solve HPj.^d to find x{d). 
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Hypotheses (A) and (B) are fairly standard assumptions for linear programming, in 
particular in the context of interior-point methods. In fact, instead of (A) and (B) one 
frequently relies on even more restrictive hypothesis (Al) that the feasible region of 
LP is bounded, and (Bl) that there is an affine feasible x E K++ : ^x = h and the 
LP is strictly feasible, i.e., remains feasible under all infinitesimal perturbations oih - 
the latter implied, for example, by having rank(74) = m and a feasible point x € ^\+- 
Note that even stronger (Al) and (Bl) are quite reasonable from a practical point of 
view: if LP is used to model a certain physical phenomenon, it is natural to assume 
the compactness of its feasible region; in addition, a well thought through model is 
typically feasible and avoids unnecessary state variables and constraints, leading to 
strict feasibility. Also, with regards to recovering the optimum, (A) and (Al) are not 
that much different from one another: (Al) clearly implies (A), conversely, if val is 
an a priori bound on the optimal value of LP, then we might as well augment the 
feasible region of LP by adding a constraint \^x < n'-^fr, thus making it compact. 
Shortly we will indicate that for all reasonable LP instances, accommodating (C) should 
not pose significant practical difficulties either; here, by a reasonable LP instance we 
understand the feasible problem satisfying (Al). We use (D) since otherwise we solved 
LP already; henceforth, we refer to x{d) as the solution of HP^^d- Since our focus is on 
analyzing continuous Shrink- Wrapping trajectories, we employ (E); in more practical 
terms, one may think of setting up a Newton's method based path-following scheme, 
e.g., similar to the so-called short-step interior-point method, to recover a sufficiently 
good approximation to x{d). 
Example: 

• consider mina:{(l, 1,0)-^ X : \^x = 3,x G Mi^} and its relaxations mina,{(l, 1, 0)^x : 
l^x = 3,x e ICi^d} for three choices of d € M++: (a) d = 1, (b) d = (1, .1, 1.9), 
(c) d = (.1, .1,2.8). All d are chosen affine feasible, l^d = 3; x* = (0,0,3) is LP 
optimum. 

Observe 2E2{x./d) = (x./(i)^(ll^ - I){x./d). The boundary of HPi^d feasible 
region corresponds to E2{x./d) = where 1 x = 3. So, for affine feasible x in 
the basis of xi,X2 the boundary satisfies 

(x./d)^(ll^ - I){x./d) = \i^Qi + r'^e + s = 

where ^ = (xi, X2), and denoting Diag(z) the diagonal matrix with Diag(z)j^j = Zi, 

1 \ ^ / 1 

Q = 2| 1 Diag(l./d)(ll^-I)Diag(l./(i) 1 |, 

1 \^ / 

r = 2( 1 Diag(l./(i)(ll^-/)Diag(l./d) |, 

0\^ /O 

s= I Diag(l./d)(ll^-/)Diag(l./d) 
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In turn, for affine feasible d > writing d^ = 3 — di — d2 we have 

3 / -2d2 3 - 2{di + d2) \ 3Q 



Q 



3-2(di + d2) 
did2{3 - di - d2) \ 3 - 2{di + d2) -2di J ~ did2{3 - di - d2) ' 



Let us analyze the sign pattern for the eigenvalues of Q; observe Q has at least 
one negative eigenvalue as its diagonal is negative as well, also 

det(Q) = -9 + 12{di + d2) - 4((i? + dl) - 4did2. 

Note that for (a) where di = d2 = 1, the matrix Q is negative definite, and thus 
the boundary of HPi^ feasible region indeed corresponds to an ellipse. In both 
cases (b) and (c), where di = 1,^2 = -1 or di = d2 = -1, we have det(Q) < 
and so the boundary corresponds to a branch of hyperbola. In fact, for any 
sufficiently small di,d2 the boundary assumes hyperbolic shape, in particular, 
when d approaches LP optimum, d ^ x*; see Figure HI Note that in both cases 
(a) and (c) x{d) = x* with the corresponding hyperbolicity directions belonging 
to an open line segment C which extends to x* . 




x*=x(^W) = ;c(£;W) = 



Figure 4: HPi ^ in the basis of Xi , X2 with varying d 



From this we make an important observation: although LP has a bounded feasible 
region, the feasible region corresponding to HPr d may become unbounded. 
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Next, we are going to discuss (C). For a cone K C M", the dual cone is defined as 
K* = {y G M" : x'^y > 0, Vx G K}. More generally, the dual cone may be defined with 
respect to an arbitrary inner product on M". A closed, convex cone is regular if both 
it has non-empty interior and its lineality space is {0}. 

Proposition 3.2. {x G M" : I'^x = n, x G /Cr.d} is bounded if and only if d is in the 
interior of K.*]^. 



Proof. Recall that we consider < r < n — 1 and that i^r-,d = {x G M™ : x = d.- z,z & 
/Cr.,i}. Note -^d G {x G M" : l^x = n, X G /C^^,^}; l^d > since d G M!^+. 
Suppose d ^ K,*]^. There are two alternatives: 

(i) there exists z G /Cr,i such that > z'^d = 1^ {d. ■ z); note we can take a G (0, 1) 
such that u = a{d. • z) + (1 — a)d G ICr^d satisfies \ u = 0, u 7^ since fCr^d is 
regular; and, therefore, ttjC? + ru G {x G M" : l^x = n, x ^ )Cr,d} for any r > 0, 

(ii) there exists z G /Cr-,i) ^^ 7^ such that z'^d = 0, simply take u = d. ■ z. 

Conversely, if d G /C* -^ the direction of unboundedness u does not exist. D 

Corollary 3.3. Let LP he such that 1 belongs to the range space of A^ . If d is in the 

interior 0/ /C* j^ , then ILPr^d has hounded feasible region. 

The condition is only sufficient, not necessary. 

By strong LP duality the compactness of LP feasible region is equivalent to the 
existence of e G ffi+_|_ such that e belongs to the range space of A^ , i.e., e = A^y for 
some y G W^. So, assuming LP has bounded feasible region with e as above, if there 
exists X 7^ feasible for LP, we may add the constraint e (x — x) = to the LP 
without changing its feasible region. Therefore, by further scaling the LP variables 
X I— )■ e. • X we may assume e = 1. The last observation potentially allows us to identify 
some candidates for the initial value of d according to (C): Ad = b with d./e in the 
interior of /C* j^, if such exist. 

Alternatively, if LP has a bounded feasible region, one may consider 

min{c'^x + M^ : ^x + 6^-67 = 0, l^x + ^ + 7 = n + 2, x,^,7 > 0}, 

x,S, 

where b = b — Al and M > is a large number. The vector d = 1 is feasible 
for this problem; by the corollary, the hyperbolic relaxation of the problem above is 
bounded. Our new optimization problem corresponds to first taking a standard big-M 
formulation of LP followed by homogenizing the variables using 7 and normalizing all 
variables to a standard simplex. Due to normalization, not all x, ^, 7 may be zeroed 
simultaneously. For large enough M at the optimum .^ = 0; also 7 > 0, for otherwise 
LP must have unbounded feasible region. To complete our justification of (C), observe 
that a solution to LP may be easily recovered from the solution to the problem above. 

Remark 3.4. Here we want to draw the first parallel between the proposed Shrink- 
Wrapping setting and path-following interior-point methods. Later in Section [5] we 
discuss this relationship in more details. Observe that assumption (B) combined with 
an additional requirement that d lies on the central path corresponding to standard log- 
barrier /(x) = — InJ]^"^^ Xj implies that indeed we may choose d with HP^d bounded. 
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In turn, note that (B) combined with existence of strictly dual LP-feasible s : A^y+s = 
c,s £ I^++) implies the existence of the central path. To see how to choose such d, 
consider LP 

minjc'^x : Ax = b,x gW^} 

together with its dual 

mayi{b^y : A^y + s = c,s e M" }. 

Recall that the central path may be characterized as x.- s = fil, fi > 0. So, if d G IK++ 
is on the central path, then fil./d is dual LP-feasible for some /.t > 0. Moreover, note 
that ^l./d is an element of the dual cone ICr^d', follows from the cone inclusion 

where /C*_i ^ consists of all nonnegative multiples of l./d. Consequently fil./d is 
feasible for the dual conic problem to HP^^d^ and by conic duality HPj.,d is bounded. 
The described argument easily generalizes to SOCP and SDP. If no such points d 
is readily available, think "self-dual embedding" for symmetric cones; this gives yet 
another, this time more theoretical justification for (C) - observe that the self-dual 
embedding will nearly double the sizes of the matrices we have to work with if we were 
to consider Newton's like scheme based on linearization of, say, 13.1.11 below for tracing 
x{d), and thus will increase the amount of computations roughly 2^ = 8-fold. 

By convexity of K^.^ and assumption (D) it follows that KKT conditions are both 
necessary and sufficient for optimality in HPj-d, and so the solution x = x{d) is char- 
acterized by a system of polynomial equations 

VEn-r{x./d) + A^y = TC, T > 0, 

En-r{x./d) = 0, (3.1.1) 

Ax = b 

with X E /Cf,,d and y G M™. To see why (D) implies necessity of KKT conditions, 
note that VEn-r{x./d) does not vanish at x{d), for otherwise we must have that 
En-r-i{x{d)./d) = and so by Proposition 12.171 xid) G M" and is optimal for LP. 
Strictly speaking, to characterize x{d) in the above we need to add another set of 
constraints ensuring x{d) G KL^^, e.g., Corollarv l2.16| p{x) = alone does not suffice. 
In addition to assumptions (A)-(E), we will be assuming that 

• (F) rank(y4) = m, 

• (G) LP solution x* is unique and has precisely m non-zeros. 

(F) is a standard convenient assumption commonly underlying the interior-point meth- 
ods and practically may be ensured by, say, performing a QR factorization of A. (G) is 
a convenient assumption that greatly simplifies the subsequent analysis; note that (G) 
is generic in a sense that it holds true almost surely for all infinitesimal perturbations 
of the constraint vector h by strict complementarity for LP, implying that even if (G) 
fails at first, it may be easily restored by slightly perturbing the original problem. We 
hypothesize that in fact (G) may be lifted altogether, but for the sake of compactness 
and readability of the manuscript we do not attempt to verify the latter claim now. 
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Observe that if we fix r = n — m — 1, HPr^d produces a tight fit relaxation to LP: 
any LP vertex, including the optimum, as a nonnegative solution to Ax = b having at 
most m non-zero entries, belongs to dlCr^d, thus hypothetically even allowing x{d) = x* 
for some well chosen d. In Section [H we will see that such a choice is indeed possible. 

Note that most of the observations we made so far may be extended beyond LP to 
other hyperbolic optimization problems, such as SOCP and SDP. 

Remark 3.5. In characterizing the solution of HPr^d, hi particular the boundary of 
/Cr,d) rather than relying on p{x) = r\En{d)En-r{x-/d) hyperbolic with respect to d, 
one may rely the concave on C\d) ratio functional 

q{x^ — 



p'{x) 



Assuming (F) and using q{x), Renegar has observed the existence of the so-called cen- 
tral line - a strictly feasible line segment C whose closure contains x* with an additional 
property that if d £ C then x{d) = x*; moreover, turns out that the Jacobian of x{d) 
for d £ C has a very special structure that allows for a nice geometric interpretation. 

From computational point of view, fixing r = n — m — 1, the usage of q{x) instead 
of p{x) might help one to better address potential numerical ill-conditioning when 
considering the gradient and Hessian in linearized KKT for x{d) such as l3.1.H as q{x) 
is proportional to 

nr=iMx) _(^ 1 y 

while p{x) is proportional to YYlLi ^i{x), and thus q{x) suffers from the additive effect 
of simultaneously zeroing more than one eigenvalue of x, while for p{x) this effect is 
multiplicative, when (G) is lost. Recall that at least one eigenvalue of x approaches 
as X nears the boundary of /C^^rf. 

3.2 Choice of dynamics 

We start by recalhng that d £ R!J:+, but x{d) ^ Wl^. From ICr^ = {x £ W^ : x = 
d. ■ z,z £ /Cr,i} and W^^ C /Cr,i C M" one may conclude that in general, the closer d is 
to a vertex of LP, the tighter the feasible region of HPr^d fits around that vertex. This 
last informal observation suggests that given some initial value (i(o) of d, it might be 
beneficial to update dtQ\ i— t- dn\ so that dn) is closer to the solution to LP, hoping that 
x(d(i)) gets closer to x* . In particular, one could consider obtaining d(i) by moving 
from dfQ\ towards x* . Since x* is not known a priori, we may choose the next best 
possible candidate, namely x{d^Q^) as a surrogate for x* in the above. 

The suggested dynamics for d and x{d) my be formalized through the ODE 

d = x{d)-d, ^3_2_^^ 

d\t=o - a(o)- 

Although, we chose this particular dynamics to govern the behavior of d and x{d), many 
other choices are possible. We are interested in studying the continuous trajectories of 
d{t), t £ [0,oo), where d{t) solves [3231 
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The following statement was conjectured by Renegar: "under LP strict dual fea- 
sibility d(t) converges to x*"; for more details see the very recent [20]. We refine it 
by observing that x{d) might not even be defined if d is chosen poorly, i.e., HPj-^d is 
unbounded; for convenience define x(0) = 0. 

Theorem 3.6. If for all t >0 we have bounded HPj.Mt), then d{t) — )• x* as t —t- oo. 

Proof. Follows from (r d = c {x{d) — d) < since HPr^d is a relaxation of LP. D 

We hypothesize that indeed for ffP^^^^^) to stay bounded for all t > it suffices to 
choose initial value d(o) corresponding to bounded HPr^d,m ) i^ which case both d{t) 
and x{d{t)) converge to x* as t — )■ oo. 

Note that with choice of affine feasible dm) j the trajectory d{t) remains affine feasible 
for all t > 0. Let columns of B form a basis for null(^). Any affine feasible point d 
can be written as d = dm\ + B6 for some 6 G R""*". ODE [3l2?T] may be re-written as 

B6 = d(o) + B^S) - d(o) - B6, 
B6\t=o = d-d(^o), 

where x{d) = (i(o) + BS,{5). Since B is injective, the above is equivalent to 

5 = m - s, 

S\t=o = (5(0). 

Consequently, we can pick an arbitrary affine coordinate system of {x € M" : Ax = b} 
to analyze 13.2.11 Thus, we call ODE [3r2?T] affine invariant. 

A corresponding discrete algorithm may be based on approximating the trajectories 
d{t) and x{d(t)) iteratively, generating a sequence of pairs, {di,Xi), i = 1, . . . ,oo: 

• given di, compute Xj ~ x{di), 

• set dj+i = di + a{xi — di) for some properly chosen a € (0, 1), iterate. 

Note that characterization 13.1.11 suggests a way to trace x{d) when d changes ever so 
slightly using, for example, Newton's method. 

Remark 3.7. Although, as we will see in Section [5l trajectories d{t) generalize the 
notion of the central path, there appears to be no known analogue for x{d{t)). In our 
limited numerical experiments x{d{t)) typically converged to x* much sooner than d{t), 
which suggests that algorithmically one might focus on tracing x{d{t)). 

In the subsequent section we formally introduce the notion of the central line C 
which acts as an invariant set w.r.t. dynamics of the Shrink-Wrapping iterates d 
(and x(d)) - invariant in a sense that if d{to) G C for some to, then d{t) £ C for 
all t > to- Next, we devote our attention to studying Shrink-Wrapping trajectories 
near the central line; in turn, choosing the neighborhood of this invariant set properly 
will enable us to lift the extra assumption on HP^Mn to stay bounded for all t > 0. 
We observe the special structure of the Hessian of x{d) where d belongs to C relying 
only on polynomials in characterization of solution to HPr^d- The latter allows us to 
significantly simplify the analysis of Shrink-Wrapping trajectories in the neighborhood 
of £, as compared to [23], and show that the central line acts as attractor set for d, 
provided the initial iterate d(o) '^^s chosen significantly close to C 



4 On Shrink- Wrapping trajectories 
4.1 Invariant central line 

For a set of indices B let xts denote a vector with coordinates Xi, i £ B, e.g., 



2 

5 

V6/ 



{3,4} 



We also simply write x_j when we want to obtain a vector from x of dimension one less 
by dropping i coordinate. Let x.^ = x.*x; as usual, power operation takes precedence 
over multiplication or division. Component-wise vector operations take precedence over 
standard operations on vectors, and otherwise occur in order of appearance. For two 
vector-valued functions x,y : "P — )■ M", we write x = 0{y) if there is a constant 
K > Q such that \xi\ < K\yi\, \/i = l,n, \/p G V. [A]B] denotes vertical block-matrix 
consisting of ^, -B: 

A 



For a matrix A we use A-^i to denote its i column. 

From now on, hxr = n — m — 1 in HPy.^d\ note x* belongs to the boundary of tC^^d- 
Given (E) , without loss of generality we may assume the last m coordinates ox x* to be 
non-zero. We choose the following parametrization of {x € M" : Ax = 6}: let ^ denote 
first n — m components of affine feasible x. Note that ^ = at the LP optimum x* . 
Fixing B = {n — m + l,n — m + 2,. . . ,n}, xq G M™ corresponds to last m components 
of x; note Xq is a vector of (non-zero) basic components of x* , compare this notation 
with the example in Section [3l Similarly, we denote 5 to be first n — m components of 
hyperbolicity direction vector d, and djs its last m components. 

Note that if x is affine feasible, we may re-write Ax = h as xq = A^ + x^ for some 
A G M"^x"-™; similarly de = M + x*^. Recall that ODE [323] is affine invariant; thus, 
to understand d{t) we may equivalently analyze trajectories 5{t) of 

Definition 4.1. An open linear segment £ C {x G I^++ '■ ^x = b} whose closure 
contains x* is called the central line if for any d € C we have x{d) = x* , and C is not 
a proper subset of any other linear segment with the above properties. 

Since we will be mostly working with first n — m coordinate parametrization of 
{x G M" : Ax = b} and, in particular, ODE 14. l.!) we allow for a minor abuse of 
notation by using the same symbol C for the central line wether when referring to a 
subset of M" as in the definition above, or its projection onto first n — m coordinates. 

Proposition 4.2. The central line exists. 
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Proof. Rewriting the first equation of conditions 13.1.11 in the first n — m coordinates 

[/;I]^Diag(l./d)V,i?™+i(z)U=,./rf = T[I;Afc, 

and observing that {VzEj-n+i{z))- = Em{z-i)-, i = 1, . . . , n, at 2; = x* we get 

Era{dB)l./5 = T[I-A]^C, (4.1.2) 

recalhng that x* has only last m non-zeros. 

Since x* is a unique minimizer for LP, we must have that 

and so [/; j4]-^c G R""!,^™ by elementary LP conic duality. 

Set 5 = 1./{[I]AY'c) G M""^™ and observe that Amax > may be chosen such 
that the linear segment £ = {d G M++ : d = x* + [I;A]6- A, A G (0,Amax)} is the 
largest possible. In turn, any 5 corresponding to (i G £ will result in positive r in l4.1.4t 
moreover, clearly x* G dlC^^d since, again, x* has precisely m zeros, and so 13.1.11 is 
satisfied at x* . Therefore, for any d £ C we have x{d) = x* . D 

Due to affine invariance of ODE [3?2?T] for the purpose of its analysis, without loss of 
generality, we assume that [/; A]'^c = 1 and consequently 5 = Al, A > 0, when d € C: 
if not, simply re-scale the first n — in coordinates accordingly. 

Observe that the elementary syminetric polynomial in x = [.^; xq\ G M" satisfies 

Em+i{x) = EiiOEmixB) + E2{0Em-l{xB) + E^{i)Em-2{xB) + " " " • (4.1.3) 
Proposition 4.3. The Jacobian of S,{5) for 6 & C is of the form 

J ^ EUx*B-/dB) (j _ 11^ 
^ E^^i{x*Q./dB) V n-m 

Proof. In order to compute the derivative of ^{5) for 6 & C we implicitly differenti- 
ate [3TTTTJ Consider a vector ^Si of partial derivatives J^ ' - the first column of J^{6)'^ . 
Differentiating second equation of 13.1.11 and re-writing it terms of 5, ^, recalling 
that 14. l.i] implies V^Em+i{x./d)\^=o is a positive multiple of 1 since 5 is also a positive 
multiple of 1, it follows that £,Si is orthogonal to 1, that is. 



In order to differentiate the first equation in l3.1.H we first revisit the expression of 
the gradient of Em+i{x./d) in coordinates 5,^: note that 14. 1.3] implies 

V^Era+l{x./d) = V^Err,+l{[e,XB]./[d]dB]) 

= V^ {Ei{C./5)Em{xB./dB) + E2{C./5)E^^i{xB./dB) + ■■■) 

= Diag(l./5)^1 Em.{xB./dB)+ 

Ei{C/6)A'rD[ag{l./dB)V,Em{z)U^,^,/a^+ ^ ' ' > 

Biagil./S)y,E2iz)\,=i:,/sEm-iixB./dB)+ 

E2{C./5) A'^Dmg{l./dB)V,Em-i{z)\,=,^,/ds + • • • • 
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Differentiating the above and evaluating at .^ = we get 

^ (Diag(l./5)1 E„XxB./dis)) = Diag([-l./(52. o])l Em{xl./dis)+ 

Diag(l./5)1 V,Em{z) 




z=XB./ds)='^ Uh-I^)^ Diag(l./dB)V^E™(z)U=^.,/rfg, 



T 



^ (Diag(l./(5)V,E2(^)U=e/5 E,n-i{xB-/dB)) = Diag(l./<5)(11^ -I)Dmg{l./5)i5,E„,.^{x%./dis) 



and 



^ (i?2(eM) A^Diag(l./dB)V,S^_i(z)|,=,^,/,^ 



with all the remaining "higher-order" in ^ terms in the expression for ■t^'sI ^Em+i{x . / d) 
being zero. As a result, sX 5 = Al G C where A > and corresponding ^ = 0, the first 
equation of differentiated 13.1.11 becomes 

^[/;I]^c= [-l//\^-MEm{xl./dB)+ 

^V,ii;„(z)|J^^^ /,^ {{Ais,)./ds - x%./dB.\ ■ A,i) 1- (4.1.5) 

-S^£,6i Em.~l{x*Q-/dt3), 

observing the cancelations due to the orthogonality condition 1^ ^Si = 0; note that the 
affine feasibility requirement is satisfied by the choice of coordinates. 

Finally, to compute ^Si we need to solve 14.1.51 together with 1-^ ^^^ = - a system 
of n — 77Z + 1 equations in n — ?ti + 1 variables ^Si , ^ • Pre- multiplying both sides of 
the expression by 1^ , recalling [/; A]'^c = 1, we obtain 

dr _ -Emix^./dts) Lo_ 
dSi (n — 7Tt)A2 A 

where 

, , V7 I? f ~\ 

\z=x*g./dB 

Now, using the expression for ^ we may re- write 14.1.51 as 

"(n'^-m/A?^ ^ = [-1/A^;0] Emix^./dB) - ^4 Em-iix^./drs), 

resulting in 

■ E,n{x*e./dB) ( 1 

^'' Em^,{x*^./dB)V^'^ n-m 

where en\ = [1; 0] G M""*" is the first unit vector. 

Similarly, we derive the expressions for q\. , i = 2, . . . ,n — m. D 



u = VzEm{z)\^=^,,. i{Ais,)./dB - X*B./dB.^. ■ A,i 
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The Jacobian of ^(5) for 6 ^ C may be interpreted as a negative projection onto 
the nuh space of \^ with a corresponding multiple „ *" ,% F, \ > - recall that 

d = [S-ds] G WI+ for 5 e C; note that ES^x'TdB) '^^ ^^^*^ ^°^ ^^^ '^ ^ ^- ^^^ *"™' 
this implies that, up to first order, a small deviation of 5 from C in the direction 
orthogonal to 1, that is, orthogonal to the central line, results in the displacement of 
the corresponding ^{5) in precisely the opposite direction, see Figure [5j 

The last observation suggests that C might be an attr actor set: when considering 
the dynamics of l4.1.H note that small deviations of 5 away form C appear to be counter- 
acted by corresponding changes in ^{5) away from 0, thus, forcing 5{t) to cross-over 
the central line. In what follows we will see that indeed this is the case. 



4.2 Trajectories near central line 

It is convenient to introduce the following orthogonal decomposition of (5 € M"^" 
5 = 5|| + (5_L, where 5\\ = Al, A > 0, and 1^5^ = 0. 
Intuitively, if 

for some > and the approximation above is "accurate enough", we expect 



hit) 



5|,(0)-e-*, 



and so 



hit) 


fv 


6^{0) 
<^||(0) 



~et 



as t — 7- oo. 



Note that the Jacobian of S,{S) for 5 £ C suggests that the system |4. 1 . 1 1 indeed assumes 
the form of ODE as above, at least in some vicinity of the central line. However, as we 
witness in this subsection, although our intuition proves to be correct, we have to be 
quite careful since ^{5) governing 14.1.11 might easily fail to be differentiable at 5 = 0. 

For the next lemma we allow for a slight abuse of notation using 5\\ to denote the first 
coordinate 5i of a vector 6, and 6± to denote the vector of the remaining coordinates 
6-1 in some orthonormal basis; note that this is consistent with, say, equipping M""™ 
with a system of orthonormal coordinates where the first coordinate axis is aligned with 
1. The quality of the approximation in the ODE above that suffices for our purposes 
may be characterized by the following statement. 

Lemma 4.4. Let 6 be governed by the following ODE with locally Lipschitz continuous 
right-hand side 



+ 



\s±\\' 
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for some 9 > 0. Then for any fixed Ai > there exists e > such that for any initial 
6(0) = (5|| (0) + 5± (0) in the central wedge 



W = {6: \\5±\\ <<5|| -e and A e (0,Ai)} 



we have 

for some v > 0, and 



S\\{t) <S»{0)e 



-ut 



s±it) 



hit) 



< 



s±{o) 



^11 (0) 



-cut 



for some uj > Q, and so 5{t) G W for all t > 0; moreover, a; — t- 0, z^ — t- 1 as 



MO) 

■5||(0) 



0. 



Proof. Since the right-hand side of the ODE above is locally Lipschitz continuous, 
the unique and continuously-differentiable solution 6{t) exists for any choice of initial 
6i\ (0) 7^ and arbitrary (5_l(0), at least on some open interval oft containing 0. Consider 



1 d 
2di 



S± 



6^ h d^ 



^TofM^i 






and note that by Cauchy-Schwarz inequality 



OJ 



t=0 



1 d 

~ 2di 



t=o 



where 



OJ 



9-K 





-K2 




2\ 



t=0 



and the constants Ki , K2 > satisfy 



and 



Clearly, a; > provided 



V '^11 






< K 



<K2 



. \\S±\\' 

m 
"5, IP 



'II 



5x(0) 



5||(0) 



is small enough, e.g., (5(0) G W for sufficiently small e. 



Continuity of 6{t) implies ^ 



< for t G [0, r) for some r > 0, and so 

is decreasing on [0, r); therefore, if 5(0) G W and, in addition, 5||(t) is non- increasing, 
then 6{t) G W for all t G [0,r). Moreover, for any fixed k > we may choose r such 
that for all t G [0, r] we have 



0< 



UJ 



S± 



1 + K. 



< 



t=0 
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1 d 

2 di 



S± 



and so 





2 


t=T 


6± 


2 



t=0 



t=odt 



'W 



dt < 



2uj 

1 + K 



s± 



'II 



T. 



t=0 



Similarly, differentiating Su and choosing e small enough in W 9 (5(0), we may show 
that for sufficiently small r > we have 



ld_ 
"2di 



-<5ii 5ii 



.r.„ofM^i 



'II "II - V 6n 



>si 



l-Ko 



6± 



>0 



for t € [0, r], and so 5\\ is monotone-decreasing on [0,r] implying 5{t) € W, and 



t=T 



< 



where 



t=o 



l-i^2 



1 + K II 



6± 



t=o 



'II 



t=0 



Noting that since 5{t) G W, the argument above may be repeated at t = r treating 
it as i = 0, we observe that the solution 6{t) G W with the above properties may be 
extended to any t > 0. Finally, it is left to recognize the exponents in the bounds for 



s± 


2 


t=T 




2 



and 5d 



t=o 



t=T 



t=o 



while letting r — ;• 0, followed by k — )• 0. D 



Observing that our proof relies on the big-O form of the ODE only in some central 
wedge W, that 9 only needs to be bounded away from on W, and choosing the 
coordinate system for 5 € M""™ so that the first coordinate is aligned with 5\\ , we may 
state the following result; recall 6 = 5n + 6± forms orthogonal decomposition of 6. 

Corollary 4.5. // in some central wedge 

W = {(5 G M"-*" : (5 = 5|| +(5x,5|| = Al,1^6± =0,\\S±\\ < ||(5|||| •?, and A G (0,Ai)} 
S,{6) is locally Lipschitz continuous, and the 0DE \4-1A\ may be re-written as 

S = -S-e-6± + 0mj\-l] (4.2.1) 

where 9 = 9{6) > is bounded away from on W, then there is a possibly smaller 
central wedge W C VV corresponding to e <e, 

W = {(5gM"-" :'^ = '^|| +^±,S\\ =A1,1^6± = 0, ||(5±|| < ||5|||| • e, and A £ (0,Ai)}, 

such that for some fixed i^jOJ > we have 



¥t)\ 



< 



\hio)\ 



SAt) 

5|iW 


< 


5x(0) 
■5||{0) 



'Ut 



-Ljt 



for any 5{0) G W. 
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We say that 6 converges exponentially to C if 



s±{t) 


< 


S±{0) 



-Ult 



LO > 0, and 

< \\6{0)\\ e~^^,ri > 0, see Figure EJ we say that ^ = ^(5) converges exponentially 
to if ||'^(t)|| < ||'^(0)|| e~™*,ro > 0. The main result of this section is as follows. 

Theorem 4.6. For any Ai G (0, Amax); there is a corresponding central wedge W such 
that if S{0) £ yV then 6{t) converges exponentially to the central line C. Moreover, the 
corresponding S,{t) = S,{S{t)) converges exponentially to 0. 

Observe that to prove the theorem, by Corollary 14.51 it is left to exhibit that indeed 
the ODE |4?LT] may be written in the form l4.2TT] in some central wedge W. 

We start by investigating the behavior of ^((5) for 6 near the central line. Recall 
that the necessary and sufficient conditions 13.1.11 for x{d) may be re-written in terms 
of 6,^ to characterize ^{6) by 



V^Em+ii[C;xB]./[5;dB])-Tl 
Em.+imxt3]./[5;dt3]) = 0, 



0, 



where the expression for V ^Em+i{[C; xis]./[5; dis]) may be found in 14.1.41 t > 0, and, 
additionally, [C;xb] G K.n-m~i,d captured, for example, via Corollarv I2.16| note that 
r > guarantees that ^ corresponds to the minimum and not the maximum in HPr^d- 
First, we drop the positivity requirement on r and consider the conditions for the 
extremum of HPrd, treating 6 as a fixed parameter, which may be written as 



fiO 



projix {V i.Er,^+l{[C■, xt3]-/[d; de])) 
Em+i{[^;xB]./[S;dB]) 



OG 



where proj]^± is a projection onto the subspace orthogonal to 1 in some suitable basis, 
e.g., proJix(a;) = P^uj, P = [1^;-/] G ]^(n-m)x{n-m-i) ^ r^^iat is, for a fixed 6, ^{6) 
corresponds to a root of /(^) and the above produces n — m polynomial equations 
in n — m variables. Precisely for this reason we do not hope to obtain a closed-form 
algebraic expression for S,{S), as it is well known that even a single- variate polynomial 
of degree five and higher in general is not solvable in radicals. Instead, we attempt 
to approximate S,{5). The two main tools that we rely on are the Implicit Function 
Theorem and Newton's method. 

For fixed 5 G M corresponding to strictly LP-feasible d, the Jacobian of /(^) at 0, 



JfiO) = 
may be inverted by solving 



P^VlE„^+i{[C;xB]./{S;di3]) 
V.E„^+i{[txB]./[S;dB]f 



5=0 



Jf{0) • Ag = -/ (4.2.2) 

for an arbitrary vector / G M""™. Observe that just as the first equation in /(^) = 0, 

proJi± {\/^Em+i{[C;xB]./[S;dB])) = 0, 

is satisfied if and only if there is r such that 

V.E,n+i{[txB]-/[6;dB]) - Tl = 0, 
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same holds true for the Unearization of this equation with respect to ^. That is, while 
solving J/(0) • Ag = — / for A^, we may equivalently consider 

V|i?„,+i([e;xB]./[<5;dB]) • A^ - rl = -P(P^P)"l/-(n-„^), 
V^E.^+M;xB]./[5;dB])^ ■ A^ = -fn-m, 

where the gradient and Hessian are evaluated at ^ = 0, which, in turn, becomes 



Em-iix^./drs) ■ {1./5 (C./5)^ + C./5 {l./5)^) ■ A5+ 

E^_i{x*^./dB) ■ Diag(l./<5)(11^ - /)Diag(l./<J) • A^ - rl = -P(P^P)-V-(n-m), 

E^{xl./dB) {l./5f ■ A^ = -/„_™, 



with 



^ = ^ nr7yTDiag(5) (A' Diagil./dB)V.E^iz)U=., /d^) (4.2.3) 

Em~l[XB-/dB) ^ B / 



and Em-i{x*B-/dB) 7^ 0. Noting that the second rank-1 term, Q./5 (l./(5) , of the 
Hessian in the equation for A^ above may be replaced by ^ ~{x*'Td ) ^-/^ "^^^ *° *^^ 
second equation in the above, pre-multiplying the first equation by Diag((5), we get 

(4.2.4) 



Em-iix*B./dB) {If + 11^ -l)-A^ = r6 + f, 

Emix^./dB) 1^ • Ag = -fn-m, 



where 



and 



Ag = Diag(l./5)A5, (4.2.5) 



/ = -Diag(5)P(P^ Py'f^in-m) + --— -;yj--^ C. (4.2.6) 

Em[XQ./aB) 
Pre-multiplying the first equation by 1-^ and using the second equation, we get 

Err,-i{xB./dB) [{n-m)C Ag - + = r 1 6 + 1 f 

\ Em{XQ./dB) Em{XQ./dBj J 



and so 



^ = T^S [^-^^-'S./d.) ^(n - m)C A, - ^^(^._/,^) + E^ixydB)) " ^ ^j " 
Substituting the expression for r back into the first equation of 14. 2. 41 we have 

r, A^ - ^ ( (^ - rn)fn-m fn-m 1^7 \ 7 



^c 



1^6 y Em{x*^./dB) Em{x*^./dB) Em-l{x*^./dB)J Em-lix*^./dB) 
where 



I,= (i-(!^.-)c- + {ii--/). 
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In turn, the above may be resolved relying on the Sherman-Morrison formula for the 
rank-1 update for the inverse of D, noting that (H"'" — ^) = —I — — 



-{n—rn) ' 
T 



1 — {n — m) p 



provided 



p = V5-V5VC + {n-m)Q'5i^Q, (4.2. 



and so, assuming 14. 2. 8^ we may compute A^ and, consequently, A^; finally, this allows 
us to recover the inverse of Jf{0) from the solution of 14.2. 2[ Note that if we write 
(5 = 5|j + (5_L, (5|| = Al, 1^6± = 0, the expression for p becomes 

p = {n-m) (A + C'^5±) = {n-m)(l + C^^ ) A, (4.2.9) 

and thus 14.2^8] may be easily satisfied by choosing 5 with ||5_l||/A small enough, that 
is, by choosing 6 close enough to C. 

Recall that according to our assumptions C = {6 € IR"~™ : 6 = Al, A G (0, Amax)}- 
One may formulate the following simple technical proposition. 

Proposition 4.7. For any fixed (Ao,Ai) C (0,Amax) there exists e > such that for 
any 6 in the truncated central wedge 

r={(5GM""™ ^'^ = ^11 +^±^h = Al,1^5± =0, ||<5x|| < ||5|||| -e, andAe (Ao,Ai)} 
S,{S) is smooth and we have 

Proof. For a moment, consider /(^) as a function f{6; ^) of two vector variables 5 and 
^; note that / is C°° with respect to both 6 and ^ on the interior of LP feasible region. 
Since ^{S\\) = and J/(0) is non-singular on £, by the Implicit Function Theorem for 
any A G [Ao,Ai] there is a smooth function ^a('5) such that /(5;^a(<^)) = for 5 in 
some open ball B^{6it) of radius e > centered around 5ii = Al; clearly, the union of 
such balls over all A € [Ao,Ai] forms an open cover of [Ao,Ai]. By compactness of 
[Aq, Al], we may choose an open finite sub-cover U = Uj=2 fc -^^^('^Im) ^ [Aq, Ai] and 
since i?^. ((5|i j) overlap with one another, we may construct a smooth function S,ui^) 
for 6 in an open neighborhood U of [Aq, Ai]. In particular, Cu{^) is twice continuously 
diff'erentiable and e > may be chosen small enough so that the truncated wedge 
T CU. The result follows from Taylor's expansion of iu{^)- 

Finally, observe that ^ = ^t/(5) indeed corresponds to the minimizer of HPr^d for 
some fixed d = [S^d-e], and is not an arbitrary root of /. To show that x = [^;xb] S 
ICn-m-i,d note that Proposition 12.171 implies that the branches of Em+i{x./d) = 
are distinct except for at the faces of M" , and thus, if $,u{^) switches branches, then 
it cannot be smooth or even continuous; therefore, ^u{^) must result in x{d) that 
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additionally satisfies the conditions of Corollary 12.161 Similarly, since the feasible 
region of LP is assumed to have a non-empty interior, and so, in particular contains an 
open ball of some radius e > 0, ^u{^) cannot switch from being a minimizer at .^(0) = 
to being a maxiniizer at some other 6 € T and yet stay smooth, because HPr^d is a 
relaxation of LP and consequently c^x{d) must be smaller than the maximum of HP^^d 
by at least ||c||e > 0. D 

Remark 4.8. Strictly speaking, when discussing the Jacobian of S,{6) in previous sub- 
section, we should have justified the existence of differentiable ^{5) first as in the above 
proposition. We intentionally delayed this discussion till the present subsection in an 
attempt to keep our motivation more transparent. 

The above and Proposition 14.31 result in the following straightforward consequence. 

Corollary 4.9. For any (Ao,Ai) C (0,Aniax), the truncated central wedge T may be 
chosen so that ^(6) is continuously differentiable for 5 £ T and 

V ll'^iill 

for some 9{6) > 6 > 0. 

Note that the big-0 constant in the above might depend on the choice of Aq, Ai. 

If we could show that S,{6) is continuously differentiable in some neighborhood 
U oi 6 = 0, hy combining U with T the last corollary would imply that for any 
< Ai < Ajnax we may chose the central wedge W C ^UT as in Corollarv 14.51 where 
S,{5) is continuously differentiable, and so is locally Lipschitz continuous. Moreover, if 
S,{6) was well-behaved on l/( in a sense of l4.2.1|, this would imply our main result. 

That is, currently, not only we cannot guarantee that the quality of big-O approxi- 
mation in the above corollary for 5 G T does not deteriorate too fast as Aq gets closer 
and closer to 0, we are not even guaranteed that ^((5) is smooth enough in any central 
wedge W to guarantee the existence of the solution to ODE [^TT] near 6 = 0. 

In particular, recall that the existence of the inverse of Jf{0) depends on l4.2.8( and 
so, potentially may be compromised in the limit as 5 — >• 0, preventing us from being 
able to extend T in the above proposition and corollary to enclose 6 = 0. Indeed, as 
the following example illustrates, ^(5) may fail to be differentiable at 0. 

Example: 

• considering the same problem as at the beginning of Section [3] with its relaxation, 
min^^Kl, l,0)"^x : l^x = 3, x £ Rij_} and min^^Kl, l,0)"^x : l^x = 3,x G /Ci^^}, 
we derive the explicit expression for ^{6). 

LP optimum is x* = (0,0,3), so B = {3} and 6 = {di,d2),^ = (xi,X2). Recall 
that the boundary satisfies i^^'^Q^ + "^^^ + s = with Q, r, s as before, namely, 

n- 6 ( -2^2 3-2(5i+52)A ^ _ 6 ( h 

^ di^2(3-5i-^2) \^ 3 _ 2{6i + 62) -26i ) ' <3-id2(3-5i-d2) \^ Si 

The optimality conditions for ^[6) correspond to 

V5 {]^i^Qi + r'^i + ^=Qi + r = Tl,T> 0, 
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and recalling det{Q) = — 9 + I2{6i + 82) — 4((5f + 5|) - 4(5i(52, for small enough 
6 G IK++ may be equivalently re-written with r > as 



e 



1 



-26i 



2(<5i + 52) - 3 
det(Q) V 2(51 + ^2) -3 -2^2 



t1 



S2 
Si 



Substituting ^ back into the boundary condition to get r we get 



-2 -,Tp^~i 



t'VQ~'1-? 



•T^-l 



f =0, 



with r 



Si52(3-Si-S2) 



r, and so 



■\/SlS2 



as out of the two quadratic roots we are interested in positive r . Finally, observe 
that r results in S,{6) not being differentiable at 5 = 0. The figure below illustrates 
a position of ^((5) in relationship to its Jacobian-based approximation for one 
particular 6. 




Figure 5: Shrink- Wrapping dynamics close-up 



Remark 4.10. Non-differentiability of ^{6) at 6 = also prevents us from relying on 
a standard ODE sinfc-type argument [1], as clearly, in the way it is defined, S,{S) does 
not even exist beyond the nonnegative orthant. It is conceivable that from purely 



29 



algebraic point of view one may extend ^{S) beyond M'^^'" as, say, a solution to the 
polynomial system of equations. However, the basic problem of non-differentiability 
at (5 = is still likely to persist if we continue using Euclidian coordinates for ^, 5. 
Along the latter lines, in |23j it has been suggested that perhaps a non-liner change of 
coordinates, namely, spherical coordinates, might be a more suitable choice to address 
the problem; in particular, such a choice allows to overcome this difficulty in the 
example above and subsequently permits the usage of a sink. To justify the existence of 
a continuously differentiable ^{5) in spherical coordinates beyond nonnegative orthant 
one may attempt to use the Implicit Function Theorem. However, here we choose to 
follow Newton's method-based analysis as, hopefully, it may subsequently be used to 
lay down the ground work for the path-following in the actual optimization algorithm. 

To remedy the situation, we rely on a different approximation to ^{5) for S near 0, 
namely, we use the first iterate of Newton's method and its error analysis as in [5] . 

Fix 6 £ C so that the corresponding ^{6) = and consider 5 £ M"^™, 6^6. If 5 
is chosen close to 6, we may attempt to approximate ^{5) by finding an approximate 
root ^(5) + A^ = Ag of f{^) based on linearization at 0. That is, we solve the following 
equation for the Newton step A^: 

/(0«/(0) + J/(0)-A5 = 0, 

where J/(0) is the Jacobian of /(^) at ^ = 0. 

Intuitively, in the limit as 5 — )• (5 the Newton step A^ must resemble the first-order 
approximation to S,{6) obtained with the Jacobian J^{5): both rely on linearizations of 
S,{S) but at ever so slightly different points 5 and 6. The latter provides a key motivation 
for working with Newton iterates, as we hope to obtain a usable approximation to ^(5) 
that takes on a nearly-projection form similar to J^{5) acting on 6 — 5. 

To compute the Newton step Ag we specialize / to /(O) in 14.2.21 that is, 



/ /_(„_„,) \^( P^V^Er,^+i{[e,xts]./[S;dts]) \ 

^ V fn~m J \ J 

With the above in mind, we have 



5=0 







-Biag{5)P{P^P)-^P^Em{x%./dB)l./5 
-i?^(x^./dB)Diag(<5)(/-^) 1./6 



and 

A_ ( ^'^f ] + 7 ^ E^{x*^./dB ( n-m 

I'rS \ Em-l{xl./dB)) Em-l{xl./dB) Em-l{x*^./dB \ 1^6 

Thus, using the earlier expression for D~^, distributing all the terms and simplifying, 
we get the following expression for the scaled Newton step A^: 

n—m r 



A, 






= -^:^^^Hin-m)6-l-8l). 
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Using 5 = (J|| + 5_L, (5|| = Al, 1 (5_L = 0, and observing 

1^6 - 1^6 l^C + (n - 7n)C^d = (n - m) (A + C^Jj 
we get 



Ernix^./ds) 



S± 



and so, re-scaling by Diag((5) we finally have 

Em.{x*^./dB) / A . 1 



A, 



i?n^-l(x^./dB) \A + CT§_ 



S± + 



A + C^<5j 



iS±)-' 



(4.2.10) 



For real analytic /, it is well known that under mild non-degeneracy assumptions, 
namely the invertibility of the Jacobian of / at the root ^, Newton's method converges 
quadratically to the associated root ^ if the initial iterate z is chosen close enough to 
^. To this extent we formulate a slightly more specialized and simple result following 
the analysis in [5] , introducing two auxiliary quantities 

p. = \\f{zr'f{z)\\, 

which corresponds to the length of the Newton step at z, and 

f'{z)-'f(>^)iz) 



Iz 



sup 

fc>2 



kl 



1 
fc-i 



Lemma 4.11. There is a universal constant ai > such that if 

Pz-lz < ai 



a. 



then the distance from z to the associated zero ^ decreases quadratically with each 
Newton iteration starting from z, that is, denoting Z(o) = ^ ^.nd 



for all i > we have 

\\z(i+i) -Cll < 

where ^(n) 



Z{i+i) = Z(^i) - f (z(i)) / (z(i)) , i > 

^Iz 



■\\zii)-i\Y 



(5-Vl7)^(2a,)(l-2a, 
lu^ — 4ti + 1, and, moreover, ||z — ^|| < 2/3^. 
Proof. Pick Qi > so that 2ai is less then the real root of 2u^ — 6u^- 



5 + 



u—1: 



since ^(u) is monotone decreasing for u < I — ^, for u G [0, 2ai) we have ^, w,__ . < 

Y^ . By the cubic root formula it may be verified that the decimal expansion of the 
real root of the above polynomial, truncated to first five significant digits, is .11218. 

Note that ai < ao = j{lS — Sy/vf) ss .15767, where ao is the best known value for 
the constant in Theorem 2 in Section 8 of [5]; therefore, the theorem implies 

20^ 2ai 

Iz Iz 
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Since 11^ — ^11 ' Iz < ^ai < 1 — -^, Proposition 3 in Section 8 of [5] implies 

7? < 



2 

Iz 



*(2a,)(l-2a,) 
Since ||z — ^|| • 7^ < ^ • ^,^^ 7(i-2a ) ^ '^4^1 by Proposition 1 in Section 8 of [5] 

follows. Observing that the above inequality, in particular, implies ||2(i) —^11 < II-2~C||; 

the last proposition may be re-applied to estimate ||z(2) ~?ll since ||z(i) — Cll "75 < ^"^''''^ 
and so on, thus, completing the statement of our lemma. D 

In particular, we may choose ai = .11218, in which case 

II {z - f'{zr'f{z)) - i\\ < 207. • \\z - ef < 8O7. • ^i (4.2.11) 

Proposition 4.12. For any fixed < Ai < Amax there exists ?> such that for any 
6 in the central wedge 

W = {(5 G M""™ : (^ = ()^ll +5±,5i\ = Al,1^6± = 0,\\5±\\ < ||5||||-?, and A G (0,Ai)} 

^((5) is smooth and we have 

Proof. We rely on the result of the previous lemma, namely, 14.2.111 Note that C, as 
defined bv l4.2.31 remains bounded from above on any LP strictly feasible closure of W. 
Consequently, the existence of the Newton step Ag guaranteed by I4.2.8|. recalling I4.2.9| 
may be ensured for any (^ G W by choosing e > sufficiently small. So, for a moment, 
fix e such that^llCll < K, K > 0, for aU 5 G W, and 1 - if? > 1/2. Then for 
5 = 5|| + (5_L G W bv 14. 2. 1(1] we have /3o = ||A^|| = ?0(||(5j| ||). If necessary, we will refine 
our choice of W at a later point by further reducing ?. 

It is left to analyze 70; recall that for an operator F its (induced) norm is defined 
as supyj^o II II • Since / is polynomial, for k > m + 1 the differential f^' vanishes. 
For k < 771 + 1, the /c-order differential of /, evaluated at a fixed fc-tuple x is a vector 
whose first n — m — 1 components are of order 1/||5||'^"'"^ with respect to 6, and the 
last component is of order 1/||(^||'^. Recalling that /'(0)~^ = Jf{0)~^ may be recovered 
from the solution to l4.2.21 observing from l4.2T71 that ||-D^^|| = 0(1) on W, and recalling 
the definition for / as in 14.2.61 particularly, that the first n — m components of / are 
scaled by Diag((5), and the fact that from 14. 231 we have A^ = Diag((5) A^, we conclude 
that the composite A;-linear operator f'{z)~^f^'{z) acting on a fixed /c-tuple x results 
in a vector of order 1/||(5||'^~^. Now, applying the induced norm and taking k — 1 root 
we conclude that 70 = O = (1/||<J||) = 0{l/\\5\\ \\) on W. 
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Combining our estimates for /3o,7o, we get ao = /^o " To = ^0(1) on W. So, if 
needed, e and the corresponding central wedge W may be further reduced to result in 
ao < «! on yy, completing our estimate on ^((5). 

Finally, observe that ao < ai on W in particular implies that JfiS,{S)) is invertible 
since 7f(5) is finite, see the above lemma. Invoking the argument similar to that of 

Proposition 14. 7| considering a finite open cover of the closure of W, the Implicit Func- 
tion Theorem implies the existence of a smooth ^(5) defined on W that corresponds to 
the minimizers of HP^^d- D 

The above combined with 14.2.101 result in the following straightforward consequence. 

Corollary 4.13. For any < Ai < A^ax; ihe central wedge W may he chosen so that 
S,{6) is continuously differentiable for 5 £ W and 



for some 9{6) > 6 > 0. 

The last corollary completes the proof of our main result - Theorem 14. 6t the behavior 
of ^{t) is a straightforward consequence of exponential convergence of 5{t) to C. 

Remark 4.14. The actual basin of exponential convergence to £, that is, a subset of 
LP feasible region starting from which the trajectories 5{t) converge exponentially to 
C, and consequently, to the LP optimum at 6 = 0, inight be far more complicated than 
simply a central wedge W; for once, such a set must necessarily contain the union of all 
the central wedges as in Theorem 14.61 each corresponding to different < Ai < Amax- 
Moreover, instead of relying on Newton's method-based analysis of ^{S), alternatively 
we could combine the central wedge W for small \\6\\, addressing the potential non- 
differentiability of ^((5) at 0, and the truncated central wedge T for \\6\\ relatively 
large. Note that intuitively, for 5 = 6\\ + 6± close to vC in a sense of small ||5_l||/||5|| ||, 
the approximation to S,{5) based on the Jacobian Js^i^u) becomes 

while the Newton step approximation with small \\6\\, ||5_l||/||(5|| || also results in 

i{5) « Ag « <5x, 

Ljm~iyXQ./dB) 

SO we expect the two approximations to act alike, see Figure [5j In addition to the 
above, if we were to rely solely on Newton's method, it is well known that the basin of 
convergence for the method alone may very well be extraordinarily complicated - see, 
for example, Mandelbrot set [5]. 

We conjecture that 5{t) enters the central wedge W for some t > if ^((^(0)) exists. 

To illustrate the kind of implications continuous trajectories (5(t),^(t), t > might 

have for the resulting optimization algorithm, which would most certainly operate 
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on discrete iterates 6^^',^^^', i > 0, consider the following proposition; as before, for 
simplicity we assume that ^{6) is easily available given 5. Once again, it is convenient 
to adapt the following notation: 6ii denotes the first coordinate of 5 where the first 
coordinate axis is aligned with 1 G M^"""^^ 6± denotes the remaining (n — m) — 1 
orthonormal coordinates of the vector 5 in this new coordinate system. 

Proposition 4.15. Given the initial iterate 6^^^',^^^' = C{^ )j consider a simple 
bisection-type scheme for determining the values of 5: 



m + 1 V / 



m + 

where ^^ = ^(5^). // 5^ satisfies \\6f^\\/6'^°^ < e, 5J°^ > for sufficiently small 

e > 0, i.e., is inside the properly chosen central wedge W, and, in addition, \\S^^'\\ is 
sufficiently small, then the iterates 6^"^' converge at least R-linearly and .^'•*) converge 
R- super- quadratically to the LP optimum 0, in particular, for some K > {) 



I|5«II<I|5(°)||Q)^ «-rf|ie«ii<A'Q) 



2'+i 

, i> 0. 



Proof. For brevity of notation we use 5 to denote 5^^> , and 5^ to denote 5^^>. According 
to the previous corollary, choosing 5 G W we can write 

O 



5\\ 



^^'^ ' '^ + o(lHI))^. + o^ii^'"^ 



since the limit of differentiable function ^ " i'^'* u \ is — as (5 — )• 0, and so 



2 

Clearly, we may choose ||(5|| sufficiently small so that in the expression above we have 
and thus, considering the first and the last (n — m) — 1 components of 5"*" we can write 

So, if necessary, we may further reduce e > in ||(5_l||/(5|| < e < 1 to guarantee 

1 II5"*"II e 

S+ < -6u, and ^^ < -. 
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Repeating the argument at 5 = 6^^' , 6~^ = 6^'^' , and observing that now the quantity 
mO(||5||) < 3 also gets at least halved, from 14.2.121 we have 



2j II ' ^(2) - 2 V2, 

and so on for i > 2, ultimately resulting in 

/I \ * llx(*)|l /I \ 2'-l 

The last two bounds combined with the expression for ^{6) in the previous corollary give 
us -R-super-quadratic bound on ^'*^ as claimed; the bound on \\6^^' \\ follows trivially. D 

Note that the above bisection scheme in fact does not require us to know the op- 
timal basis B a priori: replace 6 iterates with the corresponding d''"*"^' = d^'^' + 
^^^ (x*-*^ — S^') , i > 0, where x^*) = x{S^'). For an illustration, see FigureO 

We fully anticipate the criticism of the last proposition as being reliant on very 
strong assumptions from any practical point of view, e.g., the availability of ^{5). 
However, it should be understood that the purpose of the latter proposition is, at 
this point, solely illustrative. We would like to add that in our limited computational 
experiments we observed that indeed ^ appears to be a much more promising candidate 
to follow numerically, as the iterates ^^^' seem to converge to the optimum much sooner 
than the corresponding 6^^' . This suggests that when designing the actual optimization 
algorithm based on the Shrink- Wrapping setting one might benefit from focusing on 
^^*' rather than S^^' ; in the subsequent section we will see that the iterates 6^^' have an 
existing analogue in the interior-point methods, while, in contrast, ^^^' appear to be 
quite unique to Shrink- Wrapping. 

Remark 4.16. A natural direction in refining the last proposition towards making it 
more or less practically meaningful is to consider the Newton based approximation 
to ^(*+^) from the previous iterate ^(*^; the latter is consistent with numerical path- 
following approach commonly employed by the interior-point methods. Also note that 
for large m, the ratio m/{m + 1) is very close to 1, e.g., when m = 99, the Shrink- 
Wrapping iterates 6^'^' would traverse at least 99% of the distance to the boundary of 
the LP feasible region with each step. This, again, is consistent with the so-called 
predictor-corrector-type interior-point methods. Moreover, in order to get fast con- 
vergence of 1^'*) iterates, most probably we can get away with requiring the multiplier 
in front of (^'*^ — (5*-*') to approach the value mjim A- 1) only asymptotically. Lastly, 
amongst many other immediate potential research directions, developing an intrinsic 
proximity measure of an iterate to the central line appears to be of great importance. 
However, we believe that these questions go well beyond the scope of this paper. 

5 Pathological central paths vs. Shrink- Wrapping 

In this section, we contrast the behavior of the central path to the Shrink- Wrapping 
trajectories d{t),t > 0, for some known LP instances with large total curvature of the 

35 



central path. Namely, we consider the following three LP instances: Megiddo-Shub 
simplex [T7], DTZ snake [12j . and redundant Klee-Minty cube [lOj . |llj . 

The total curvature of a smooth curve -here, the central path- is defined as a definite 
integral over the total length of the curve of the norm of the curvature vector, where 
the latter corresponds to the second derivative of the curve equation parameterized by 
its arc-length, see, for example, [l2]. In a sense, the total curvature tells us how far 
is the curve from being a straight line: for a straight line the total curvature is 0, for 
a planar curve that coincides with a 7r/2-segment of the boundary of the unit circle 
the total curvature is vr/2, etc. Intuitively, if we were to attempt to follow the curve 
numerically using, say, a predictor-corrector type scheme, where one tries to make a 
predictor step as close to the curve's tangent as possible, the total curvature gives us 
some idea of how difficult it might be to traverse such a curve. For example, one can 
traverse a linear segment of total curvature with just one predictor step knowing the 
exact tangent, while on the opposite end of the spectrum, it might take many steps to 
follow the curve of large total curvature that makes many sharp turns. 

In the context of path-following interior-point methods one typically attempts to 
follow the central path that leads us to the optimal solution, starting from the problem's 
analytic center. As such, one may expect to witness many iterations of the optimization 
algorithm when dealing with an LP instance where the central path is known to have 
large total curvature, e.g., see [TT] . 

Therefore, given the above motivation, our goal is to investigate the total curvature 
of the Shrink- Wrapping trajectories d{t) as compared to that of the central path, and 
get some feeling how the two differ at least from the numerical perspective. We hope 
that the latter would shed some light onto how efficient an algorithm based on the 
Shrink- Wrapping setting might turn out. We focus on d{t) rather than x{t) = x{d{t)) 
as the dynamics for d defined by 13.2.11 seems immediately suitable for defining the 
corresponding discrete predictor-corrector scheme, see second subsection of Section O 
while it is not yet clear what would be the natural setup for tracing x{t) alone. 

Due to the nature of the LP's considered, it is convenient to re-write the problem 
in the so-called dual form 

max{/'^y : Gy < h}. (5.0.13) 

y 

Note that this does not mean that we take the dual problem to the LP under consid- 
eration, but rather simply re-write its constraints in the inequality form. The central 
path V corresponds to the standard log-barrier and may be parameterized as 

k 
V = {y £ U. : y{fi) = argmax fJ-f y +/ ln(/ij — Gi y) for some n G (0,oo)}, 

i=l 

where Gi^-, is the i*^ row of G G M'^^^. 

For the sake of comparison {d(t)}t>o trajectories, developed for the LP in standard 
equality form, are mapped to the Shrink- Wrapping trajectories T) in the same space 
of y-variables as V; since the transformation of d{t) is affine, it does not change the 
qualitative nature of our conclusions. If we report x{t) or x{d), we allow for a slight 
abuse of notation and use same symbols for equivalent points in y-basis. Both V and 
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T> are started at the analytic center 

k 

X = argmax y'ln(/ij - Gi.-y). 

i=l 

Basically, we aim to understand which of the two, the central path V or the Shrink- 
Wrapping trajectory V, appear to be more straight. The presented findings are mostly 
numerical and only suggest certain conclusions. Although, the subsequent exposition 
is fairly lengthy, we believe that is is important to provide enough details for the 
numerical experiments to be repeated by the reader independently from us, if desired. 

5.1 Megiddo-Shub simplex 

For sufficiently small e > 0, LP may be formulated as follows 

minjc^x : l^x = l,x G WV} 

X 

where 



Ci 



-(1 +ey ^ ,i <n, 
,i = n, 



and re-written in the dual or inequality form with / = — c_„ and y G M" ^ as 

max{/^y: [1^; -/]y < [1;0]}. 



V 



With e^^' denoting the j*'^ unit vector, i.e., e^ = 1 and e_- = 0, the optimal 

solution is y* = e*-""^^ E M""""*^. For e small enough, the central path V is known to make 
n — 2 sharp nearly- 7r/2 turns. Let J-j = conv{e"', e""^^', . . . ,e^"'~^'} denote {n — j — 1)- 
dimensional face of {y G M""^ : [1^; —I]y < [1; 0]} spanned by e^^\ . . . , e^'^~^\ The 
path starts at the analytic center x = -1 and first proceeds nearly orthogonal to the 
face J^i- Next, the path moves almost inside J-i and nearly orthogonal to J-2, until it 
nearly reaches J-3, at which point the path again makes a nearly-7r/2 turn towards the 
next face T4, and so on, until V reaches y* = Tn-i] see Figure El^a). Respectively, the 
total curvature of V is of order n7r/2; the lower bound may be established using the 
technique of [11], the upper 0{n) bound on the total curvature follows from the bound 
on the so-called average total curvature of V established in [9] . 

The corresponding hyperbolic relaxation HPy.fi is a convex quadratic optimization 
problem, that is, since tti = 1 we have r = n — 2, and so the boundary of HPj-d is 
characterized by E2{x./d) = 0, and thus x{d) may be computed explicitly. 

Renegar has observed that in case of TTi = l,r = n — 2, the Shrink- Wrapping 
trajectory for the LP in equality form, started at a point on the central path, coincides 
with the portion of V from that point on, namely, it can be shown that with d € V, 
d as in 13.2.11 produce a direction tangential to V. From the characterization of the 
central path it follows that a tangent vector d! to "P at a point d is given by 

V)ia.g{l. / d.'^) d! = -ijLc + A^u, Ad! = Q, (5.1.1) 
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for some 7^ /.i € M, n G M™; moreover, /!i > corresponds to the direction of increasing 
/i, that is, improving the objective value along V. At the same time, the first of KKT 
conditions 13.1.11 for x = x{d) implies 

X = Diag(d.2) {tc + A^v), r > 0, t; e M™, 

while the second condition in 13.1.11 used to determine the precise value for r for now 
may be ignored. One may verify that the direction d' = d = x{d) — d indeed solves [5Tl. II 

Diag(l./d.2) d' = Diag(l./d.2) (Diag(d.2) {tc + A^v) - d) 
= TC + A^v - l./d = (r - fi)c + A'^{v + w), 

noting that d £ V implies —l./d = —fj, c + Aw, fj, > 0. Note that t ^ fi, as if it was, 
we could write tc = fic = l./d — A^w and so x./d."^ = tc + A^v = l./d + A'^{v — w) 
resulting in x./d? — l./d = Diag(l./(i.^)(x — d) = A^{v — w), that is, we could scale the 
vector x — d, which belongs to the null space of A, by pre-multiplying it with a positive- 
definite matrix Diag(l./(i.^), and obtain a vector in the range space of A , A {v — w), 
which is impossible as the null space of A and the range of A'^ are orthogonal subspaces 
of M". Furthermore, since the LP objective is monotone along both V and D, we must 
have T < /.i. Lastly, Ad = is trivial. 

So, in this case, V = V, see Figure E^a), and consequently the Shrink- Wrapping 
trajectory is bound to have relatively large total curvature on the order of n; note that 
both the dimension of the ambient space £ = n — 1 containing the feasible region of 
the inequality-form problem and the number of corresponding inequality constraints 
k = n are almost the same. 

5.2 DTZ snake 

For this and the next subsection it is more natural to describe the optimization problem 
in its dual form lS. 0.131 Since the Shrink- Wrapping trajectories were developed for LP 
in standard equality form, we start by describing the equivalent transformation between 
the two formulations. Namely, given [570.131 we explain how to formulate the equivalent 
LP, equivalent in a sense that any feasible point of 15.0.131 is uniquely mapped into 
LP-feasible point and vice-versa, including the optimal solutions. For simplicity we 
assume G G M*"'^^, k > I, to be full-rank. 

Observe that Gy < h may be re-written as 

x = /i-Gy, xG M^, (5.2.1) 

that is, {h — x) belongs to the column-space of G for some nonnegative x. Let rows of 
A G mS >^ form a basis of the null space of columns of G, then 

A{h -x) = AGy = G R''-^ 

and thus Gy < h may be re-written as 

Ax = b,x e Mi, 
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Figure 6: Shrink- Wrapping dynamics close-up 



where h = Ah. Likewise, from h — x = Gy, given x we may easily recover y by 
pre- multiplying both sides with G : 

y = {G^G)-^G^{h-x), (5.2.2) 

and so minimizing —fy corresponds to minimizing —f[GG)~^G{h — x) = —ch + 
c^x with c = G{G^G)~^ f\ note that c^h is a constant term that does not depend on x. 
Therefore, 15.0.131 may be re- written as LP with A, 6, c as above and m = k 



,n = k. 

The detailed DTZ snake construction in inequality form and the subsequent analysis 
of the central path's geometry may be found in |12j . The equivalent LP may be 
constructed according to the procedure above. For illustration purposes we consider 
the case of fc = 6,^ = 2, in which case we have / = (0,-1), 

Gi,! = 0,Gi,2 = l,G2,i = 1,G2,2 = -1/10, G3,i = -1,G'3,2 = -1/3, 
Gi,i = (-l)SG,,2 = -i^, ^>4, 



11 



hi = l,/i2 

h- - ^ 
ih — 11 



l/2,/i3 = l/3. 



i > 4. 



and m = 4, 71 = 6 for the equivalent LP. 

For A;-even, y* is given by the intersection of the third and last inequality producing 
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for A;-odd, the solution corresponds to the intersection of A;*'' and {k — 1)*'* inequahty, 
rT * * 1/10 10^^,^, ,A , lO'^-^ ^ 5 10"^ 

The optimal x* may be computed according to l5.2.1j note in case k = 6,i = 2, x* has 
four basic and two non-basic variables, namely, Xg = Xg = 0. As A; — )■ oo, the central 
path V is known to make almost k nearly- vr sharp turns, see Figure [71 Respectively, 
the total curvature of V is at least of order k. As mentioned in |12j . the construction 
may be easily generalized to arbitrary £; also, DTZ-snake may be modified to make all 
the constraints non-redundant. 

The corresponding hyperbolic relaxation HP^-d corresponds to the first hyperbolic 
derivative cone of K+_|_, that is, r = 1 and the boundary of HPj.^d corresponds to 
En-i{x./d) = 0, so no explicit formula for x{d) seems likely to exist. 

Unfortunately, for DTZ-snake construction we could not establish an analytic rela- 
tionship between the total curvature of V and D unlike for the case of Megiddo-Shub 
simplex. Instead, here we resort to numerics. 

First, we describe our computational methodology for recovering V. A seemingly 
natural choice would be to use a short-step path-following interior-point method, see, 
for example |18j . However, in our computational experiments we observed that this 
approach suffers heavily from numerical errors as the iterates approach the optimum, 
in part, due to inherent ill-conditioning and the large bit-input size of F. In turn, 
this causes significant problems while attempting to recover V as for DTZ-snake the 
central path starts to exhibit its pathological behavior only very close to y* , where the 
short-step method would typically fail due to round-off errors. 

The numerical stability problem is resolved by re-parameterizing V with level sets of 
f'^y: y{u) = arguiaXy.jTy^j^ Z]i=i l^(^i ~ Gi^-y), which results in univariate maximiza- 
tion problem on a fixed interval. The latter one-dimensional optimization problem for 
finding the point y(;/) is handled with a simple bi-section method, thus, avoiding the ill- 
conditioning problems associated with the second derivative-based Newton's method; 
as a stopping criteria for the bi-section scheme we use the length of the interval contain- 
ing y{i^) falling below a prescribed threshold. Since we are interested in recovering the 
geometry of the path, the length of the interval measured with respect to the Euclidian 
norm appears to give us more accurate answer when approximating y(z^), as opposed 
to working with the norm induced by the self-concordant barrier typically used in the 
interior-point methods. The reason is the lack of scaling along any particular direction 
for the Euclidean norm unlike for the barrier-induced norm. 

In order to traverse V, we gradually increase the corresponding v parameter starting 
from the value f'^Xi and generate a sequence of iterates {yi}i=o,K in close Euclidian 
proximity to the central path, until we reach the LP optimum. Furthermore, to speed 
up computations of each subsequent yi, we warm-start the bi-section from yi-i- Near 
the optimum the discrete stepping of v gets more and more refined to allow us to capture 
sharp turns of V. The first iterate yo corresponds to the approximate analytic center x, 
which is computed using MATLAB 'fsolve' routine: we attempt to zero out the gradient 
of the log-barrier, starting from the initial approximation that corresponds to the 
analytical center (0, |5| ) of the perturbed LP with a feasible region corresponding to 

40 



|y GM^ : /;-/;-e(2)^;...;-e(2)'^ y< (1, 1, 1,0, 0, . . . ,0)|, i.e., a planar unit cube 
centered at (1/2, 1/2) with the bottom face repeated fc — 3 times. The last iterate in 
the sequence is yx = y*] K is chosen so that vk-i is close enough to y* . The resulting 
approximate central path P is a piece-wise linear interpolation ofV from {yi}i=o,K- 

Next, we describe our computational methodology for recovering D. We compute 
the approximate Shrink- Wrapping trajectory for LP and map both d{t) and x{t) onto 
the feasible region of l5.0.13] according to l5.2.2[ To recover d{t), x{t), we employ standard 
discrete predictor-corrector scheme for tracing the trajectory of the ODE given by l3.2.1l 
given some initial pair {di,Xi),Xi ^ x{di), we set the next iterate dj+i = di + a{xi — di) 
and Xj+i ~ x((ij+i), where a > is some small constant. The predictor-corrector 
scheme is known to converge to the true ODE trajectory when a — t- under some mild 
assumptions. We experimented with several choices of a. We found that the most 
numerically stable approach is to normalize the predictor step length along (xi — di) 
to have a prescribed length a, where a is either fixed on the order of 10^^ — 10~^, or 
is gradually decreasing as di approach x* to enforce LP feasibility of dj. Both step 
normalization choices appear to attain virtually indistinguishable numerical results. 
We generate the sequence {di}i=o,A' with do approximating the analytic center of LP 
and dx = x* , similar to the case of V. The approximate Shrink- Wrapping trajectory 
2? is a piece- wise linear interpolation from {di}i=i^K- 

In order to compute Xj+i f« x(dj+i) we use Newton's method to find the root of 
/(^) as defined in the previous section, warm-started at Xj, with termination criteria 
being the Euclidian norm of the gradient of /(^) falling below a certain threshold. 
The corresponding function evaluations and derivative information may be computed 
using the FFT approach outlined in [19]. Given do -equivalently, yo ~ x~ we recover 
the initial point xq ~ 2;(do) by performing a linear homotopy from another point 
on the central line. That is, we numerically follow x{d) using Newton's method as 
d traverses [d, do], starting at d G £. Recall that at least in the vicinity of C the 
Newton iterates are well defined. As d gets gradually changed from d to do, MATLAB 
does not encounter any problems with ill-conditioning or non-invertibility of derivative 
matrices. The latter and the homotopy path x{d), d G [d, do] appearing rather smooth, 
see Figure [71 indicates that we indeed did not switch branches of f{^) and recovered 
the correct approximate to x(do); if desired, we may further confirm the validity of our 
approximation by checking that x{d) is in or close enough to ICr^d- 

Note that unlike the central path iterates y,, we do not use any low-order method 
to recover Xi because there appears to be no suitable re-parametrization of x{d) readily 
available. Thus, hypothetically, our computations for D are more susceptible to round- 
off errors. However, we are still fairly confident in the results of our numerical findings 
due to the following two reasons. 

• Computational safeguard procedure: to make sure our numerical approach pro- 
duces no obvious nonsense results, we re-compute an approximate central path 
relying on the equivalence of V and P for r = n — 2, using the outlined numerical 
approach for computing D with r = n — 2 as above. We compare our results 
with the first approximation V to make sure both paths are consistent with one 
another. Indeed, both methods seem to recover visually indistinguishable approx- 
imate central paths. Moreover, we are not overly concerned with approximating 
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x{di) with Xi due to the fact that numerical ill-conditioning of Newton's method, 
resulting from the ill-conditioning of the derivative matrix, as reported by MAT- 
LAB, manifests itself for the iterates di only well past the last sharp turn of the 
central path with respect to the LP objective value. That is, by the time MAT- 
LAB begins to report the numerical ill-conditioning for locating Xi ~ x{di), the 
respective points on the central path that correspond to the LP objective level 
sets with values c^di are located well past the last sharp turn of V. 

Central line: since the transformation 15.2.21 is linear, the existence and the 
attractor-like properties of the invariant central line persist through the equivalent 
transformation between the LP formulations. In particular, for our example, in 
the basis of y- variables the central line is extends from y* « (—0.333, —0.000866) 
and passes through a point with approximate coordinates (.0528, —.0000171) ~ 
coincident ally, the point which we start the linear homotopy from to recover xq, 
see Figure O the procedure to recover C given x* is outlined in Proposition 14.21 
By Theorem 14.61 if the trajectory D at some point gets sufficiently close to the 
central line, from that point on D gets pulled into the line very quickly, and, 
most certainly, the central line may not be crossed over. Examining our numeri- 
cal results we see that indeed T> appears to get very close to the central line and 
straightens out from that point on, see Figure U\ concurring to our intuition. 




{x(d):d&[d,do\}\\ W0},.o 



Figure 7: Dynamics close-up for DTZ snake 



42 



As the purpose of this section is mostly to gain some quahtative insight into the behav- 
ior of Shrink- Wrapping trajectories comparative to the central path, we do not attempt 
to further refine or justify our numerical approach for tracing T>. 

In summary, for DTZ construction, the Shrink- Wrapping trajectory does not seem 
to exhibit any pathological behavior as compared to the central path; both d{t) and 
x{t) trajectories appear to be fairly straight and thus are likely to have small total 
curvature, with d{t) making only one turn. Unlike the case of V, where the source of 
the large total curvature is the constant zigzagging of the central path, similar behavior 
for D is less likely due to the existence of the central line. 

5.3 Redundant Klee-Minty cube 

The detailed problem formulation in inequality form and the subsequent analysis of the 
central path's geometry may be found in [11]. With properly chosen parameters the 
central path is known to make at least 2^ — 2 sharp nearly- 7r/2 turns closely following 
the standard simplex method pivot sequence, resulting in large total curvature of V; 
k is exponential as a function of i. In particular, we use the geometrically- decaying 
distance model with k = O (^'^2 ) and rely on Corollary 7.2 of [TT] that guarantees 
exponential order of the total curvature of V. For our illustration we consider planar 
redundant Klee-Minty cube with i = 2 and k ~ 14, 000, given by / = (0, 1), 

Gi,i = —1, Gi^2 = 0, G2A = 1, 6*2,2 = 0, Gs,! = e, ^3,2 = —1, ^4,1 = —e, ^4,2 = 1, 

Gi,i = -l,Gi,2 = 0, i£ [5,4 + /ii], 
Gi,i = e,Gi^2 = -I, i £ [5 + hi, 4 + hi + h2], 

hi = 0, /i2 = 1, h^ = 0, ^4 = 1, 

hi =ri, i £ [5,4 + hi], 

hi = r2, i e [5 + hi, 4 + hi + /12], 

where e = .1, 5 = .05 and ri = 8,r2 = 4, hi = 2963, /12 = 10766 are computed according 
to [H]. Clearly, the optimal solution is y* = 0. We intentionally reduce e,5 to get 
sharper turns of V: the central path makes two sharp nearly-7r/2 turns near vertices 
(1, 1— 2e) and (1, e), see FigurelH^b). The corresponding LP has m = 13731, n = 13733. 
The corresponding hyperbolic relaxation HPj.^d corresponds to the first hyperbolic 
derivative cone of ]R"_|_ with r = 1. Similar to DTZ construction, we cannot analyze the 
setting analytically and resort to numerics: most of the numerical considerations above 
may be carried over to the case of redundant Klee-Minty construction. We implement 
several changes to better address the nature of the problem. 

• For the construction, the short-step path-following interior-point method pro- 
duces stable numerical results which are consistent with theoretical findings in [11] , 
therefore, the method may be used to recover V for comparative purposes. 

• When recovering D, it is much more efficient to re-cast Newton's method for 
tracing x{d) into the basis of y-variables, which gives us much smaller, and thus, 
less prone to numerical errors, linear system that we need to work with. 

• Lastly, due to high degree of the -ffPr^^-underlying hyperbolic polynomial, the 
FFT approach seems not as effective as for DTZ snake, mostly due to round-off 
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errors. Instead, we use the ratio j;~/Wrf\ to characterize the boundary of K.r,d 
outside of SIR" . The latter allows for an explicit and simple form of the derivatives 
needed to implement Newton's method; also, see the subsequent discussion. 

Instead of directly computing the central line, which in this case seems to exhibit its 
attractor properties only towards the very end of the Shrink- Wrapping trajectory, we 
claim that HPr^d produces a very tight relaxation to LP itself due to the high degree 
of the underlying hyperbolic polynomial and the presence of many remotely-positioned 
redundant constraints. In other words, for any LP-strictly feasible d, x{d) may not 
be far from x*; consequently, d{t) gets driven to x* almost along the straight line, see 
Figure [Dj^b). For brevity we only sketch the argument. 

It is convenient to switch back and forth between the domains of l5.0.13l and LP; to 
this end we introduce three pairs of vector variables in M^ and M" spaces respectively, 
y = d, 7 = A, y{t) = y + jt = x{t) with i G M, where = is the equivalence relationship 
given by 15. 2. H 15.2.21 We allow for a slight abuse of notation referring with HPj.^ to 
both the primal and y-space re-formulation of the hyperbolic relaxation problem. 

Fix y in the interior of Klee-Minty cube defined by the first four constraints of 
Gy < h; denote the corresponding 1-dimensional faces -hyperplanes- by J-"i , J-2 , J-3 , J-4 . 
In order to understand how close is HP^-^i to LP, consider when 

_ En-i{x{t)./d) _ di d2 dn 



En{x{t)./d) Xi{t) X2{t) Xn{t) 

crosses 0, that is, when En-i{x{t) . / d) = 0; recall that the feasible region of HP^-^d 
touches LP-feasible region precisely at the vertices of Klee-Minty cube. In other words, 
we ask how far along the ray y{t) = y + jt one needs to travel outside of the Klee-Minty 
cube before encountering the boundary of H Pr^d-^easihle region. 

A simple root t of En-i{x{t)./d) = in the vicinity of the boundary of Klee- 
Minty cube may occur only past the point when y{t) crosses either J^i,J^2,^3,J^4, and 
only one of these faces at a time, i.e., past t* when only one of the corresponding 
xi{t*),X2{t*),X3(t*),X4{t*) becomes zero. By the root interlacing property of poly- 
nomials with all real roots applied to En,En-i we know that En,En-i > inside 
LP-feasible region, and En < 0, En-i > just on the outside. So 

just outside of LP-feasible region; in fact, q{t) — )• —00 as t — )• —-^, —-^, —-^ or —-^ 
from the left, while y{t) remains feasible with respect to the remaining three faces. 
Re- writing 

di sr^ di 



*) = Eziir7 + E 



^ di + Ait ^di + Ait 

= 1 1=5 



we note that for the second summand, recalling the redundant constraints, we have 
M = hi r + /i2 7 r<> ■ 



+ (l + 2-.l) '4 + (l + 2-.l) fr'^di + Ait 
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for all y{t) within .1 or lesser Euclidian distance from Klee-Minty cube. 
Since only one J-i, i = 1,4, is being crossed-over, say, J-"i, we can write 

as long as 

di / 1 

'^-atO + m 

noting that t is such that di + Ait < and assuming the remaining J^2,J^3,J^a feasible 
- note that two faces of the cube may not be consecutively traversed by y{t) without 
encountering a root of q{t) - a root of En~i- So, q{t) changes sign from minus to plus 
as t goes through ( — -^ (l + -p) , — -^ ) • So, from any LP-strictly feasible y along any 
7 one needs to traverse precisely t = — -^ to get to the boundary of LP- feasible region, 

and at most t = —^ {^ + jj) to reach the boundary of i:fPj.,;^-feasible region. 

In other words, H Pr^d-^easihle region is a "slightly inflated version" of Klee-Minty 
cube, "inflated" by a factor of at most 1 + jj ^ 1.0001, so, indeed y{t) remains in 
close proximity to the cube, particularly, is not farther than .1 in Euclidian distance. 
Moreover, recall that HPr^d is convex and touches LP-feasible region only at the ver- 
tices of Klee-Minty cube. Simple geometric considerations may be used to complete 
the claim, resulting in x{d) ~ x* . Note that the ratio q is the reciprocal of the concave 
ratio functional briefly mentioned in Section [3l 

Going back to our numerical results, again, unlike V, T> appears to be much more 
straight which suggests it having much lower total curvature than the central path, see 
Figure [6)^b) . Note that the redundant constraints -the source of large total curvature 
for the central path- are handled exceptionally well in the Shrink- Wrapping setting, 
as effectively they do not play any negative role in determining the dynamics of d{t). 
In fact, in this example, the presence of the remotely-positioned redundant constraints 
does the opposite and helps to straighten out T>. Also, recall that the central line, 
which in turn seems to drive the limiting behavior of d{t), is defined only by the active 
constraints, so redundancy does not negatively affect us here either. 



6 Conclusion 

Following the idea of Renegar, we introduce the Shrink- Wrapping setting for solving 
linear programming problems based on hyperbolic relaxations of the nonnegative or- 
thant. We analyze the local behavior of the Shrink- Wrapping trajectories that lead to 
the LP optimum, provided a suitable choice of the initial point. A striking difference 
between the standard path-following interior-point methods and the Shrink- Wrapping 
setting is the existence of the invariant with respect to dynamics of the trajectory set - 
the central line- in the latter case. The central line acts, at least locally, as an attractor 
set for the Shrink- Wrapping trajectories, which in turn guarantees extremely quick, i.e., 
i?-super-quadratic local convergence of a simple bi-section type discretization scheme 
based on Shrink- Wrapping. 
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We attempt to analyze the behavior of the Shrink- Wrapping trajectories compar- 
ative to the central path for three known pathological linear programming instances, 
where the central path has large total curvature. Partial theoretical analysis is substan- 
tiated with numerics. Although, we encounter a negative example when the Shrink- 
Wrapping trajectory and the central path look identical, in most cases (2 out of 
3) Shrink- Wrapping trajectories appear to be much more straight than the central 
paths. This suggests that the Shrink- Wrapping approach may result in more efficient 
predictor-corrector type algorithms for solving the underlying optimization problems. 

A possible explanation to this distinctive difference between the behavior of the 
central path and the Shrink- Wrapping trajectories lies in a seemingly more appropriate 
choice of the degree of the hyperbolic relaxation problem. While one may think of 
the central path as the Shrink- Wrapping with degree fixed permanently, the setting 
suggests that the degree should be chosen adaptively. Namely, the proper choice of the 
relaxation degree results in the dynamics of the trajectory being driven by a simple root 
of a polynomial system of equations, giving rise to a number of favorable properties, 
such as the earlier mentioned central line, while for the central path such a root is almost 
always multiple. As the choice of the degree of Shrink- Wrapping suggests much tighter 
fit of the relaxation to the original problem, we expect the trajectories to converge to 
the optimum sooner - an intuition confirmed by the numerics. 

When the linear programming problem is re-written in inequality form, optimisti- 
cally, we hope that in case of Shrink- Wrapping the total curvature of the trajectory 
is driven by the dimensionality of the ambient space, rather than the number of con- 
straints unlike for the central path. Again, the investigated numerics support our bold 
conjecture. We present the initial analysis of the newly proposed setting. Much work 
remains to be done to convert these ideas into an actual optimization algorithm. 

As a side result, we provide the first, to our knowledge, proof of convexity of 
hyperbolicity cones which does not rely on complex variables. 
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